Bio


1977 B.A, Chemistry and Biology, University of Rochester, NY
1978-1982 Ph.D. California Institute of Technology, CA Advisor: Dr. Norman Davidson
1982-1986 Postdoctoral Research Stanford University School of Medicine, CA Advisor: Dr. Ronald Davis
1986-2009 Faculty Dept of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT
2009-present Dept of Genetics, Stanford University School of Medicine, Stanford, CA

Administrative Appointments


  • Chair, Dept. of Genetics (2009 - Present)
  • Director, Center for Genomics and Personalized Medicine (2009 - Present)

Honors & Awards


  • George Beadle Award, GSA (2019)
  • Elected Member, American Academy of Science (2015)
  • Stanford B. Ascherman Professor, Stanford (2011)
  • Pioneer Award, Human Proteome Organization (2009)
  • CT Medal of Science, Connection Academy of Science (2007)
  • Pew Scholar Award, Pew Foundation (1987-1991)

Current Research and Scholarly Interests


We are presently in an omics revolution in which genomes and other omes can be readily characterized. Our laboratory uses a variety of approaches to analyze genomes and regulatory networks. Our research focuses on yeast, an ideal model organism ideally suited to genetic analysis, and humans.

1) Transcriptomes
To annotate genomes, we developed RNA sequencing for annotation the yeast and human transcriptomes. We discovered that the eukaryotic transcriptome is much more complex than previously appreciated and that embryonic stem cells have more transcript isoforms than differentiated cells.

2) Transcription Factor Binding Networks
We have also developed methods for mapping transcription factor binding sites through the genome. We used this to develop regulatory maps and have been using this to help decipher the combinatorial regulatory code – which factors work together to regulate which genes. Using this approach we have mapped out pathways crucial for metabolism and inflammation.

3) Integrated Regulatory Networks
In addition to transcriptional factor binding networks we have also been mapping phosphorylation and metabolite-protein interaction networks. These studies have revealed novel global regulators and key points in integrated regulatory networks.

4) Variation
We have been analyzing differences between individuals and species at two levels: DNA sequence variation and regulatory information variations. We developed paired end sequencing for humans and found that humans have extensive structural variation (SV), i.e. deletions, insertions and inversions. This is likely to be a major cause of phenotypic variation and human disease. In addition, by mapping binding sites difference among different yeast strains and humans, we have found that individuals differ much more in their regulatory information than in coding sequence differences. We can correlate these differences with those in SNPS and SVs, thereby associating noncoding DNA differences with regulatory information.

5) Human Disease
Finally, we are applying omics approaches of genome sequencing, transcriptomics proteomics metabolomics, DNA methylation and microbiome assays to the analysis of human disease. These integrative omics approaches are being applied to help understand the molecular basis of disease and the development of diagnostics and therapeutics.

Clinical Trials


  • Multiomic Signatures of Microbial Metabolites Following Prebiotic Fiber Supplementation Recruiting

    The investigators propose a comprehensive, multiomic study that will integrate longitudinal data associating changes in specific gut bacteria and host in response to prebiotic fiber supplementation. These data will guide the development of an integrative biological signature relating bacterial-derived metabolites with biological outcome in the host. The open sharing of data generated by the proposed research represents a significant public resource that will support and accelerate future novel studies.

    View full details

  • Precision Diets for Diabetes Prevention Recruiting

    With this study the investigators want to understand the physiological differences for people developing pre-diabetes and diabetes. The investigators hypothesize that different individuals go through different paths in the development of the disease. By understanding the personal mechanism for developing disease, the investigators will find a personalized approach to prevent that development. The investigators are also hoping to be able to find a biomarker that will pinpoint to the particular defect and thus, diagnose the problem at an earlier stage and have the information to give personalized diet recommendations to prevent the development of diabetes more effectively.

    View full details

  • The 28 Day Challenge Not Recruiting

    The purpose of this study is to determine how a 28 Day Challenge influences mental health and well-being. This is a blinded study. Participants both with and without depression and anxiety, will be included. A moderation analysis will be performed to see whether changes in depression after the intervention are a function of baseline depression and anxiety levels.

    Stanford is currently not accepting patients for this trial. For more information, please contact Ariel Ganz, PhD, 650-736-8099.

    View full details

  • The Lasting Change Study Not Recruiting

    The study approach is to leverage the most cutting-edge techniques of multi-omics biology, wearable physiology, and digital real-time psychology profiling and using machine learning models to understand the mechanisms underlying the strategies and techniques that enable participants the power to initiate and maintain sustainable behavior change. Over the years, millions of people worldwide have attended immersive personal development seminars aiming to improve participants' health behaviors and wellness. Nevertheless, there's a scarcity of large-scale studies to assess their effects on behavior change and investigate their mechanism of action. A recent publication by the Science of Behavior Change Program (SOBC), launched by the National Institute of Health (NIH), recognized that: "science has not yet delivered a unified understanding of basic mechanisms of behavior change across a broad range of health-related behaviors, limiting progress in the development and translation of effective and efficacious behavioral intervention." As such, understanding the mechanisms underlying sustainable behavior change is key. The Date With Destiny (DWD) seminar is among the largest worldwide, and tens of thousands of people have already attended and testified to its transformative effect. The main objective of the study is to uncover the underlying mechanism of behavior change through longitudinal data collection of psychometrics Ecological Momentary Assessments, physiology (wearables), and biology (multi-omics) in study participants. The study specific objectives include: (1) To evaluate the impact of DWD on sustainable behavior change; (2) To investigate the mechanism of behavior change by collecting longitudinal real-time measurements of psychometrics (e.g., Ecological Momentary Assessments [EMA]), physiological (e.g., heart rate, blood oxygen level, breathing rate, and EDA), and biological (multi-omics analyses) features in study participants; (3) To assess the effect of the DWD on professional fulfillment, resilience, and mental wellness.

    Stanford is currently not accepting patients for this trial.

    View full details

  • Understanding and Diagnosing Allergic Disease in Twins Not Recruiting

    The purpose of this study is to gain better understanding of how the immune system works in twins with and without allergic disease. Healthy volunteers are not specifically targeted. Healthy non-allergic study participants may be found through the course of evaluation for the presence of allergies.

    Stanford is currently not accepting patients for this trial. For more information, please contact Kari A Nadeau, MD, PhD, 650-521-7237.

    View full details

2023-24 Courses


Graduate and Fellowship Programs


All Publications


  • Longitudinal profiling of the microbiome at four body sites reveals core stability and individualized dynamics during health and disease. Cell host & microbe Zhou, X., Shen, X., Johnson, J. S., Spakowicz, D. J., Agnello, M., Zhou, W., Avina, M., Honkala, A., Chleilat, F., Chen, S. J., Cha, K., Leopold, S., Zhu, C., Chen, L., Lyu, L., Hornburg, D., Wu, S., Zhang, X., Jiang, C., Jiang, L., Jiang, L., Jian, R., Brooks, A. W., Wang, M., Contrepois, K., Gao, P., Rose, S. M., Tran, T. D., Nguyen, H., Celli, A., Hong, B. Y., Bautista, E. J., Dorsett, Y., Kavathas, P. B., Zhou, Y., Sodergren, E., Weinstock, G. M., Snyder, M. P. 2024

    Abstract

    To understand the dynamic interplay between the human microbiome and host during health and disease, we analyzed the microbial composition, temporal dynamics, and associations with host multi-omics, immune, and clinical markers of microbiomes from four body sites in 86 participants over 6 years. We found that microbiome stability and individuality are body-site specific and heavily influenced by the host. The stool and oral microbiome are more stable than the skin and nasal microbiomes, possibly due to their interaction with the host and environment. We identify individual-specific and commonly shared bacterial taxa, with individualized taxa showing greater stability. Interestingly, microbiome dynamics correlate across body sites, suggesting systemic dynamics influenced by host-microbial-environment interactions. Notably, insulin-resistant individuals show altered microbial stability and associations among microbiome, molecular markers, and clinical features, suggesting their disrupted interaction in metabolic disease. Our study offers comprehensive views of multi-site microbial dynamics and their relationship with host health and disease.

    View details for DOI 10.1016/j.chom.2024.02.012

    View details for PubMedID 38479397

  • The importance, challenges, and possible solutions for sharing proteomics data while safeguarding individuals' privacy. Molecular & cellular proteomics : MCP Shome, M., MacKenzie, T. M., Subbareddy, S. R., Snyder, M. P. 2024: 100731

    Abstract

    Proteomics data sharing has profound benefits at individual level as well as at community level. While data sharing has increased over the years, mostly due to journal and funding agency requirements, the reluctance of researchers with regards to data sharing is evident as many shares only the bare minimum dataset required to publish an article. In many cases, proper metadata is missing, essentially making the dataset useless. This behavior can be explained by lack of incentives, insufficient awareness, or a lack of clarity surrounding ethical issues. Through adequate training at research institutes, researchers can realize the benefits associated with data sharing and can accelerate the norm of data sharing for the field of proteomics, as has been the standard in genomics for decades. In this article, we have put together various repository options available for proteomics data. We have also added pros and cons of those repositories to facilitate researchers in selecting the repository most suitable for their data submission. It is also important to note that a few types of proteomics data have the potential to re-identify an individual in certain scenarios. In such cases, extra caution should be taken to remove any personal identifiers before sharing on public repositories. Datasets which will be useless without personal identifiers need to be shared in a controlled access repository so that only authorized researchers can access the data and personal identifiers are kept safe.

    View details for DOI 10.1016/j.mcpro.2024.100731

    View details for PubMedID 38331191

  • Digital health application integrating wearable data and behavioral patterns improves metabolic health. NPJ digital medicine Zahedani, A. D., Veluvali, A., McLaughlin, T., Aghaeepour, N., Hosseinian, A., Agarwal, S., Ruan, J., Tripathi, S., Woodward, M., Hashemi, N., Snyder, M. 2023; 6 (1): 216

    Abstract

    The effectiveness of lifestyle interventions in reducing caloric intake and increasing physical activity for preventing Type 2 Diabetes (T2D) has been previously demonstrated. The use of modern technologies can potentially further improve the success of these interventions, promote metabolic health, and prevent T2D at scale. To test this concept, we built a remote program that uses continuous glucose monitoring (CGM) and wearables to make lifestyle recommendations that improve health. We enrolled 2,217 participants with varying degrees of glucose levels (normal range, and prediabetes and T2D ranges), using continuous glucose monitoring (CGM) over 28 days to capture glucose patterns. Participants logged food intake, physical activity, and body weight via a smartphone app that integrated wearables data and provided daily insights, including overlaying glucose patterns with activity and food intake, macronutrient breakdown, glycemic index (GI), glycemic load (GL), and activity measures. The app furthermore provided personalized recommendations based on users' preferences, goals, and observed glycemic patterns. Users could interact with the app for an additional 2 months without CGM. Here we report significant improvements in hyperglycemia, glucose variability, and hypoglycemia, particularly in those who were not diabetic at baseline. Body weight decreased in all groups, especially those who were overweight or obese. Healthy eating habits improved significantly, with reduced daily caloric intake and carbohydrate-to-calorie ratio and increased intake of protein, fiber, and healthy fats relative to calories. These findings suggest that lifestyle recommendations, in addition to behavior logging and CGM data integration within a mobile app, can enhance the metabolic health of both nondiabetic and T2D individuals, leading to healthier lifestyle choices. This technology can be a valuable tool for T2D prevention and treatment.

    View details for DOI 10.1038/s41746-023-00956-y

    View details for PubMedID 38001287

    View details for PubMedCentralID 3891203

  • Wearable Devices: Implications for Precision Medicine and the Future of Health Care. Annual review of medicine Babu, M., Lautman, Z., Lin, X., Sobota, M. H., Snyder, M. P. 2023

    Abstract

    Wearable devices are integrated analytical units equipped with sensitive physical, chemical, and biological sensors capable of noninvasive and continuous monitoring of vital physiological parameters. Recent advances in disciplines including electronics, computation, and material science have resulted in affordable and highly sensitive wearable devices that are routinely used for tracking and managing health and well-being. Combined with longitudinal monitoring of physiological parameters, wearables are poised to transform the early detection, diagnosis, and treatment/management of a range of clinical conditions. Smartwatches are the most commonly used wearable devices and have already demonstrated valuable biomedical potential in detecting clinical conditions such as arrhythmias, Lyme disease, inflammation, and, more recently, COVID-19 infection. Despite significant clinical promise shown in research settings, there remain major hurdles in translating the medical uses of wearables to the clinic. There is a clear need for more effective collaboration among stakeholders, including users, data scientists, clinicians, payers, and governments, to improve device security, user privacy, data standardization, regulatory approval, and clinical validity. This review examines the potential of wearables to offer affordable and reliable measures of physiological status that are on par with FDA-approved specialized medical devices. We briefly examine studies where wearables proved critical for the early detection of acute and chronic clinical conditions with a particular focus on cardiovascular disease, viral infections, and mental health. Finally, we discuss current obstacles to the clinical implementation of wearables and provide perspectives on their potential to deliver increasingly personalized proactive health care across a wide variety of conditions. Expected final online publication date for the Annual Review of Medicine, Volume 75 is January 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

    View details for DOI 10.1146/annurev-med-052422-020437

    View details for PubMedID 37983384

  • Dietary fiber deficiency in individuals with metabolic syndrome: a review. Current opinion in clinical nutrition and metabolic care Veluvali, A., Snyder, M. 2023

    Abstract

    PURPOSE OF REVIEW: Metabolic syndrome (MetS) refers to a group of risk factors, which increase the risk of cardiovascular disease (CVD), type 2 diabetes (T2D), and other chronic diseases. Dietary fiber has been shown to mitigate many of the effects of various risk factors associated with MetS. Our review summarizes the recent findings on the association between dietary fiber deficiency and MetS.RECENT FINDINGS: A number of studies have shown that dietary fiber deficiency is associated with an increased risk of MetS. The main mechanisms by which dietary fiber may reduce the risk of MetS include reduction of cholesterol levels; improvement of blood sugar control; reduction of inflammation; and promotion of weight loss.SUMMARY: Literature suggests that a deficiency in dietary fiber consumption is a risk factor for MetS. An increase in dietary fiber intake may help to reduce the risk of developing MetS and its associated chronic diseases.

    View details for DOI 10.1097/MCO.0000000000000971

    View details for PubMedID 37751374

  • Dynamic lipidome alterations associated with human health, disease and ageing. Nature metabolism Hornburg, D., Wu, S., Moqri, M., Zhou, X., Contrepois, K., Bararpour, N., Traber, G. M., Su, B., Metwally, A. A., Avina, M., Zhou, W., Ubellacker, J. M., Mishra, T., Schüssler-Fiorenza Rose, S. M., Kavathas, P. B., Williams, K. J., Snyder, M. P. 2023

    Abstract

    Lipids can be of endogenous or exogenous origin and affect diverse biological functions, including cell membrane maintenance, energy management and cellular signalling. Here, we report >800 lipid species, many of which are associated with health-to-disease transitions in diabetes, ageing and inflammation, as well as cytokine-lipidome networks. We performed comprehensive longitudinal lipidomic profiling and analysed >1,500 plasma samples from 112 participants followed for up to 9 years (average 3.2 years) to define the distinct physiological roles of complex lipid subclasses, including large and small triacylglycerols, ester- and ether-linked phosphatidylethanolamines, lysophosphatidylcholines, lysophosphatidylethanolamines, cholesterol esters and ceramides. Our findings reveal dynamic changes in the plasma lipidome during respiratory viral infection, insulin resistance and ageing, suggesting that lipids may have roles in immune homoeostasis and inflammation regulation. Individuals with insulin resistance exhibit disturbed immune homoeostasis, altered associations between lipids and clinical markers, and accelerated changes in specific lipid subclasses during ageing. Our dataset based on longitudinal deep lipidome profiling offers insights into personalized ageing, metabolic health and inflammation, potentially guiding future monitoring and intervention strategies.

    View details for DOI 10.1038/s42255-023-00880-1

    View details for PubMedID 37697054

    View details for PubMedCentralID 7736650

  • Biomarkers of aging for the identification and evaluation of longevity interventions. Cell Moqri, M., Herzog, C., Poganik, J. R., Biomarkers of Aging Consortium, Justice, J., Belsky, D. W., Higgins-Chen, A., Moskalev, A., Fuellen, G., Cohen, A. A., Bautmans, I., Widschwendter, M., Ding, J., Fleming, A., Mannick, J., Han, J. J., Zhavoronkov, A., Barzilai, N., Kaeberlein, M., Cummings, S., Kennedy, B. K., Ferrucci, L., Horvath, S., Verdin, E., Maier, A. B., Snyder, M. P., Sebastiano, V., Gladyshev, V. N. 2023; 186 (18): 3758-3775

    Abstract

    With the rapid expansion of aging biology research, the identification and evaluation of longevity interventions in humans have become key goals of this field. Biomarkers of aging are critically important tools in achieving these objectives over realistic time frames. However, the current lack of standards and consensus on the properties of a reliable aging biomarker hinders their further development and validation for clinical applications. Here, we advance a framework for the terminology and characterization of biomarkers of aging, including classification and potential clinical use cases. We discuss validation steps and highlight ongoing challenges as potential areas in need of future research. This framework sets the stage for the development of valid biomarkers of aging and their ultimate utilization in clinical trials and practice.

    View details for DOI 10.1016/j.cell.2023.08.003

    View details for PubMedID 37657418

  • Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP). Nature cell biology Jain, S., Pei, L., Spraggins, J. M., Angelo, M., Carson, J. P., Gehlenborg, N., Ginty, F., Gonçalves, J. P., Hagood, J. S., Hickey, J. W., Kelleher, N. L., Laurent, L. C., Lin, S., Lin, Y., Liu, H., Naba, A., Nakayasu, E. S., Qian, W. J., Radtke, A., Robson, P., Stockwell, B. R., Van de Plas, R., Vlachos, I. S., Zhou, M., Börner, K., Snyder, M. P. 2023

    Abstract

    The Human BioMolecular Atlas Program (HuBMAP) aims to create a multi-scale spatial atlas of the healthy human body at single-cell resolution by applying advanced technologies and disseminating resources to the community. As the HuBMAP moves past its first phase, creating ontologies, protocols and pipelines, this Perspective introduces the production phase: the generation of reference spatial maps of functional tissue units across many organs from diverse populations and the creation of mapping tools and infrastructure to advance biomedical research.

    View details for DOI 10.1038/s41556-023-01194-w

    View details for PubMedID 37468756

    View details for PubMedCentralID 8238499

  • Organization of the human intestine at single-cell resolution. Nature Hickey, J. W., Becker, W. R., Nevins, S. A., Horning, A., Perez, A. E., Zhu, C., Zhu, B., Wei, B., Chiu, R., Chen, D. C., Cotter, D. L., Esplin, E. D., Weimer, A. K., Caraccio, C., Venkataraaman, V., Schürch, C. M., Black, S., Brbić, M., Cao, K., Chen, S., Zhang, W., Monte, E., Zhang, N. R., Ma, Z., Leskovec, J., Zhang, Z., Lin, S., Longacre, T., Plevritis, S. K., Lin, Y., Nolan, G. P., Greenleaf, W. J., Snyder, M. 2023; 619 (7970): 572-584

    Abstract

    The intestine is a complex organ that promotes digestion, extracts nutrients, participates in immune surveillance, maintains critical symbiotic relationships with microbiota and affects overall health1. The intesting has a length of over nine metres, along which there are differences in structure and function2. The localization of individual cell types, cell type development trajectories and detailed cell transcriptional programs probably drive these differences in function. Here, to better understand these differences, we evaluated the organization of single cells using multiplexed imaging and single-nucleus RNA and open chromatin assays across eight different intestinal sites from nine donors. Through systematic analyses, we find cell compositions that differ substantially across regions of the intestine and demonstrate the complexity of epithelial subtypes, and find that the same cell types are organized into distinct neighbourhoods and communities, highlighting distinct immunological niches that are present in the intestine. We also map gene regulatory differences in these cells that are suggestive of a regulatory differentiation cascade, and associate intestinal disease heritability with specific cell types. These results describe the complexity of the cell composition, regulation and organization for this organ, and serve as an important reference map for understanding human biology and disease.

    View details for DOI 10.1038/s41586-023-05915-x

    View details for PubMedID 37468586

    View details for PubMedCentralID PMC10356619

  • Dynamic monitoring of thousands of biochemical analytes using microsampling NATURE BIOMEDICAL ENGINEERING Kellogg, R., Snyder, M. 2023

    View details for DOI 10.1038/s41551-023-01005-5

    View details for Web of Science ID 000920584900001

    View details for PubMedID 36697922

  • Multi-omics microsampling for the profiling of lifestyle-associated changes in health. Nature biomedical engineering Shen, X., Kellogg, R., Panyard, D. J., Bararpour, N., Castillo, K. E., Lee-McMullen, B., Delfarah, A., Ubellacker, J., Ahadi, S., Rosenberg-Hasson, Y., Ganz, A., Contrepois, K., Michael, B., Simms, I., Wang, C., Hornburg, D., Snyder, M. P. 2023

    Abstract

    Current healthcare practices are reactive and use limited physiological and clinical information, often collected months or years apart. Moreover, the discovery and profiling of blood biomarkers in clinical and research settings are constrained by geographical barriers, the cost and inconvenience of in-clinic venepuncture, low sampling frequency and the low depth of molecular measurements. Here we describe a strategy for the frequent capture and analysis of thousands of metabolites, lipids, cytokines and proteins in 10 μl of blood alongside physiological information from wearable sensors. We show the advantages of such frequent and dense multi-omics microsampling in two applications: the assessment of the reactions to a complex mixture of dietary interventions, to discover individualized inflammatory and metabolic responses; and deep individualized profiling, to reveal large-scale molecular fluctuations as well as thousands of molecular relationships associated with intra-day physiological variations (in heart rate, for example) and with the levels of clinical biomarkers (specifically, glucose and cortisol) and of physical activity. Combining wearables and multi-omics microsampling for frequent and scalable omics may facilitate dynamic health profiling and biomarker discovery.

    View details for DOI 10.1038/s41551-022-00999-8

    View details for PubMedID 36658343

  • Recurrent repeat expansions in human cancer genomes. Nature Erwin, G. S., Gursoy, G., Al-Abri, R., Suriyaprakash, A., Dolzhenko, E., Zhu, K., Hoerner, C. R., White, S. M., Ramirez, L., Vadlakonda, A., Vadlakonda, A., von Kraut, K., Park, J., Brannon, C. M., Sumano, D. A., Kirtikar, R. A., Erwin, A. A., Metzner, T. J., Yuen, R. K., Fan, A. C., Leppert, J. T., Eberle, M. A., Gerstein, M., Snyder, M. P. 2022

    Abstract

    Expansion of a single repetitive DNA sequence, termed a tandem repeat (TR), is known to cause more than 50 diseases1,2. However, repeat expansions are often not explored beyond neurological and neurodegenerative disorders. In some cancers, mutations accumulate in short tracts of TRs, a phenomenon termed microsatellite instability; however, larger repeat expansions have not been systematically analysed in cancer3-8. Here we identified TR expansions in 2,622 cancer genomes spanning 29 cancer types. In seven cancer types, we found 160 recurrent repeat expansions (rREs), most of which (155/160) were subtype specific. We found that rREs were non-uniformly distributed in the genome with enrichment near candidate cis-regulatory elements, suggesting a potential role in gene regulation. One rRE, a GAAA-repeat expansion, located near a regulatory element in the first intron of UGT2B7 was detected in 34% of renal cell carcinoma samples and was validated by long-read DNA sequencing. Moreover, in preliminary experiments, treating cells that harbour this rRE with a GAAA-targeting molecule led to a dose-dependent decrease in cell proliferation. Overall, our results suggest that rREs may be an important but unexplored source of genetic variation in human cancer, and we provide a comprehensive catalogue for further study.

    View details for DOI 10.1038/s41586-022-05515-1

    View details for PubMedID 36517591

  • Distinct factors associated with short-term and long-term weight loss induced by low-fat or low-carbohydrate diet intervention. Cell reports. Medicine Li, X., Perelman, D., Leong, A. K., Fragiadakis, G., Gardner, C. D., Snyder, M. P. 2022: 100870

    Abstract

    To understand what determines the success of short- and long-term weight loss, we conduct a secondary analysis of dietary, metabolic, and molecular data collected from 609 participants before, during, and after a 1-year weight-loss intervention with either a healthy low-carbohydrate (HLC) or a healthy low-fat (HLF) diet. Through systematic analysis of multidomain datasets, we find that dietary adherence and diet quality, not just caloric restriction, are important for short-term weight loss in both diets. Interestingly, we observe minimal dietary differences between those who succeeded in long-term weight loss and those who did not. Instead, proteomic and gut microbiota signatures significantly differ between these two groups at baseline. Moreover, the baseline respiratory quotient may suggest a specific diet for better weight-loss outcomes. Overall, the identification of these dietary, molecular, and metabolic factors, common or unique to the HLC and HLF diets, provides a roadmap for developing individualized weight-loss strategies.

    View details for DOI 10.1016/j.xcrm.2022.100870

    View details for PubMedID 36516846

  • Longitudinally tracking personal physiomes for precision management of childhood epilepsy. PLOS digital health Jiang, P., Gao, F., Liu, S., Zhang, S., Zhang, X., Xia, Z., Zhang, W., Jiang, T., Zhu, J. L., Zhang, Z., Shu, Q., Snyder, M., Li, J. 2022; 1 (12): e0000161

    Abstract

    Our current understanding of human physiology and activities is largely derived from sparse and discrete individual clinical measurements. To achieve precise, proactive, and effective health management of an individual, longitudinal, and dense tracking of personal physiomes and activities is required, which is only feasible by utilizing wearable biosensors. As a pilot study, we implemented a cloud computing infrastructure to integrate wearable sensors, mobile computing, digital signal processing, and machine learning to improve early detection of seizure onsets in children. We recruited 99 children diagnosed with epilepsy and longitudinally tracked them at single-second resolution using a wearable wristband, and prospectively acquired more than one billion data points. This unique dataset offered us an opportunity to quantify physiological dynamics (e.g., heart rate, stress response) across age groups and to identify physiological irregularities upon epilepsy onset. The high-dimensional personal physiome and activity profiles displayed a clustering pattern anchored by patient age groups. These signatory patterns included strong age and sex-specific effects on varying circadian rhythms and stress responses across major childhood developmental stages. For each patient, we further compared the physiological and activity profiles associated with seizure onsets with the personal baseline and developed a machine learning framework to accurately capture these onset moments. The performance of this framework was further replicated in another independent patient cohort. We next referenced our predictions with the electroencephalogram (EEG) signals on selected patients and demonstrated that our approach could detect subtle seizures not recognized by humans and could detect seizures prior to clinical onset. Our work demonstrated the feasibility of a real-time mobile infrastructure in a clinical setting, which has the potential to be valuable in caring for epileptic patients. Extension of such a system has the potential to be leveraged as a health management device or longitudinal phenotyping tool in clinical cohort studies.

    View details for DOI 10.1371/journal.pdig.0000161

    View details for PubMedID 36812648

    View details for PubMedCentralID PMC9931296

  • Identification of non-coding silencer elements and their regulation of gene expression. Nature reviews. Molecular cell biology Pang, B., van Weerd, J. H., Hamoen, F. L., Snyder, M. P. 2022

    Abstract

    Cell type- and differentiation-specific gene expression is precisely controlled by genomic non-coding regulatory elements (NCREs), which include promoters, enhancers, silencers and insulators. It is estimated that more than 90% of disease-associated sequence variants lie within the non-coding part of the genome, potentially affecting the activity of NCREs. Consequently, the functional annotation of NCREs is a major driver of genome research. Compared with our knowledge of other regulatory elements, our knowledge of silencers, which are NCREs that repress the transcription of genes, is largely lacking. Multiple recent studies have reported large-scale identification of transcription silencer elements, indicating their importance in homeostasis and disease. In this Review, we discuss the biology of silencers, including methods for their discovery, epigenomic and other characteristics, and modes of function of silencers. We also discuss important silencer-relevant considerations in assessing data from genome-wide association studies and shed light on potential future silencer-based therapeutic applications.

    View details for DOI 10.1038/s41580-022-00549-9

    View details for PubMedID 36344659

  • Performance effectiveness of vital parameter combinations for early warning of sepsis-an exhaustive study using machine learning JAMIA OPEN Rangan, E., Pathinarupothi, R., Anand, K. S., Snyder, M. P. 2022; 5 (4): ooac080

    Abstract

    To carry out exhaustive data-driven computations for the performance of noninvasive vital signs heart rate (HR), respiratory rate (RR), peripheral oxygen saturation (SpO2), and temperature (Temp), considered both independently and in all possible combinations, for early detection of sepsis.By extracting features interpretable by clinicians, we applied Gradient Boosted Decision Tree machine learning on a dataset of 2630 patients to build 240 models. Validation was performed on a geographically distinct dataset. Relative to onset, predictions were clocked as per 16 pairs of monitoring intervals and prediction times, and the outcomes were ranked.The combination of HR and Temp was found to be a minimal feature set yielding maximal predictability with area under receiver operating curve 0.94, sensitivity of 0.85, and specificity of 0.90. Whereas HR and RR each directly enhance prediction, the effects of SpO2 and Temp are significant only when combined with HR or RR. In benchmarking relative to standard methods Systemic Inflammatory Response Syndrome (SIRS), National Early Warning Score (NEWS), and quick-Sequential Organ Failure Assessment (qSOFA), Vital-SEP outperformed all 3 of them.It can be concluded that using intensive care unit data even 2 vital signs are adequate to predict sepsis upto 6 h in advance with promising accuracy comparable to standard scoring methods and other sepsis predictive tools reported in literature. Vital-SEP can be used for fast-track prediction especially in limited resource hospital settings where laboratory based hematologic or biochemical assays may be unavailable, inaccurate, or entail clinically inordinate delays. A prospective study is essential to determine the clinical impact of the proposed sepsis prediction model and evaluate other outcomes such as mortality and duration of hospital stay.

    View details for DOI 10.1093/jamiaopen/ooac080

    View details for Web of Science ID 000868349400001

    View details for PubMedID 36267121

    View details for PubMedCentralID PMC9566305

  • Systems analysis of de novo mutations in congenital heart diseases identified a protein network in the hypoplastic left heart syndrome. Cell systems Wang, Y. J., Zhang, X., Lam, C. K., Guo, H., Wang, C., Zhang, S., Wu, J. C., Snyder, M., Li, J. 2022

    Abstract

    Despite a strong genetic component, only a few genes have been identified in congenital heart diseases (CHDs). We introduced systems analyses to uncover the hidden organization on biological networks of mutations in CHDs and leveraged network analysis to integrate the protein interactome, patient exomes, and single-cell transcriptomes of the developing heart. We identified a CHD network regulating heart development and observed that a sub-network also regulates fetal brain development, thereby providing mechanistic insights into the clinical comorbidities between CHDs and neurodevelopmental conditions. At a small scale, we experimentally verified uncharacterized cardiac functions of several proteins. At a global scale, our study revealed developmental dynamics of the network and observed its association with the hypoplastic left heart syndrome (HLHS), which was further supported by the dysregulation of the network in HLHS endothelial cells. Overall, our work identified previously uncharacterized CHD factors and provided a generalizable framework applicable to studying many other complex diseases. A record of this paper's Transparent Peer Review process is included in the supplemental information.

    View details for DOI 10.1016/j.cels.2022.09.001

    View details for PubMedID 36167075

  • Chimpanzee and pig-tailed macaque iPSCs: Improved culture and generation of primate cross-species embryos. Cell reports Roodgar, M., Suchy, F. P., Nguyen, L. H., Bajpai, V. K., Sinha, R., Vilches-Moure, J. G., Van Bortle, K., Bhadury, J., Metwally, A., Jiang, L., Jian, R., Chiang, R., Oikonomopoulos, A., Wu, J. C., Weissman, I. L., Mankowski, J. L., Holmes, S., Loh, K. M., Nakauchi, H., VandeVoort, C. A., Snyder, M. P. 2022; 40 (9): 111264

    Abstract

    As our closest living relatives, non-human primates uniquely enable explorations of human health, disease, development, and evolution. Considerable effort has thus been devoted to generating induced pluripotent stem cells (iPSCs) from multiple non-human primate species. Here, we establish improved culture methods for chimpanzee (Pan troglodytes) and pig-tailed macaque (Macaca nemestrina) iPSCs. Such iPSCs spontaneously differentiate in conventional culture conditions, but can be readily propagated by inhibiting endogenous WNT signaling. As a unique functional test of these iPSCs, we injected them into the pre-implantation embryos of another non-human species, rhesus macaques (Macaca mulatta). Ectopic expression of gene BCL2 enhances the survival and proliferation of chimpanzee and pig-tailed macaque iPSCs within the pre-implantation embryo, although the identity and long-term contribution of the transplanted cells warrants further investigation. In summary, we disclose transcriptomic and proteomic data, cell lines, and cell culture resources that may be broadly enabling for non-human primate iPSCs research.

    View details for DOI 10.1016/j.celrep.2022.111264

    View details for PubMedID 36044843

  • massDatabase: utilities for the operation of the public compound and pathway database. Bioinformatics (Oxford, England) Shen, X., Wang, C., Snyder, M. P. 2022

    Abstract

    SUMMARY: One of the major challenges in LC-MS data is converting many metabolic feature entries to biological function information, such as metabolite annotation and pathway enrichment, which are based on the compound and pathway databases. Multiple online databases have been developed. However, no tool has been developed for operating all these databases for biological analysis. Therefore, we developed massDatabase, an R package that operates the online public databases and combines with other tools for streamlined compound annotation and pathway enrichment. massDatabase is a flexible, simple, and powerful tool that can be installed on all platforms, allowing the users to leverage all the online public databases for biological function mining. A detailed tutorial and a case study are provided in the Supplementary Materials.AVAILABILITY AND IMPLEMENTATION: https://massdatabase.tidymass.org/.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btac546

    View details for PubMedID 35944213

  • TidyMass an object-oriented reproducible analysis framework for LC-MS data. Nature communications Shen, X., Yan, H., Wang, C., Gao, P., Johnson, C. H., Snyder, M. P. 2022; 13 (1): 4365

    Abstract

    Reproducibility, traceability, and transparency have been long-standing issues for metabolomics data analysis. Multiple tools have been developed, but limitations still exist. Here, we present the tidyMass project ( https://www.tidymass.org/ ), a comprehensive R-based computational framework that can achieve the traceable, shareable, and reproducible workflow needs of data processing and analysis for LC-MS-based untargeted metabolomics. TidyMass is an ecosystem of R packages that share an underlying design philosophy, grammar, and data structure, which provides a comprehensive, reproducible, and object-oriented computational framework. The modular architecture makes tidyMass a highly flexible and extensible tool, which other users can improve and integrate with other tools to customize their own pipeline.

    View details for DOI 10.1038/s41467-022-32155-w

    View details for PubMedID 35902589

  • Single-cell analyses define a continuum of cell state and composition changes in the malignant transformation of polyps to colorectal cancer. Nature genetics Becker, W. R., Nevins, S. A., Chen, D. C., Chiu, R., Horning, A. M., Guha, T. K., Laquindanum, R., Mills, M., Chaib, H., Ladabaum, U., Longacre, T., Shen, J., Esplin, E. D., Kundaje, A., Ford, J. M., Curtis, C., Snyder, M. P., Greenleaf, W. J. 2022

    Abstract

    To chart cell composition and cell state changes that occur during the transformation of healthy colon to precancerous adenomas to colorectal cancer (CRC), we generated single-cell chromatin accessibility profiles and single-cell transcriptomes from 1,000 to 10,000 cells per sample for 48 polyps, 27 normal tissues and 6 CRCs collected from patients with or without germline APC mutations. A large fraction of polyp and CRC cells exhibit a stem-like phenotype, and we define a continuum of epigenetic and transcriptional changes occurring in these stem-like cells as they progress from homeostasis to CRC. Advanced polyps contain increasing numbers of stem-like cells, regulatory T cells and a subtype of pre-cancer-associated fibroblasts. In the cancerous state, we observe T cell exhaustion, RUNX1-regulated cancer-associated fibroblasts and increasing accessibility associated with HNF4A motifs in epithelia. DNA methylation changes in sporadic CRC are strongly anti-correlated with accessibility changes along this continuum, further identifying regulatory markers for molecular staging of polyps.

    View details for DOI 10.1038/s41588-022-01088-x

    View details for PubMedID 35726067

  • Precision environmental health monitoring by longitudinal exposome and multi-omics profiling. Genome research Gao, P., Shen, X., Zhang, X., Jiang, C., Zhang, S., Zhou, X., Schüssler-Fiorenza Rose, S. M., Snyder, M. 2022

    Abstract

    Conventional environmental health studies have primarily focused on limited environmental stressors at the population level, which lacks the power to dissect the complexity and heterogeneity of individualized environmental exposures. Here, as a pilot case study, we integrated deep-profiled longitudinal personal exposome and internal multi-omics to systematically investigate how the exposome shapes a single individual's phenome. We annotated thousands of chemical and biological components in the personal exposome cloud and found they were significantly correlated with thousands of internal biomolecules, which was further cross-validated using corresponding clinical data. Our results showed that agrochemicals and fungi predominated in the highly diverse and dynamic personal exposome, and the biomolecules and pathways related to the individual's immune system, kidney, and liver were highly associated with the personal external exposome. Overall, this data-driven longitudinal monitoring study shows the potential dynamic interactions between the personal exposome and internal multi-omics, as well as the impact of the exposome on precision health by producing abundant testable hypotheses.

    View details for DOI 10.1101/gr.276521.121

    View details for PubMedID 35667843

  • Multiomic analysis reveals cell-type-specific molecular determinants of COVID-19 severity. Cell systems Zhang, S., Cooper-Knock, J., Weimer, A. K., Shi, M., Kozhaya, L., Unutmaz, D., Harvey, C., Julian, T. H., Furini, S., Frullanti, E., Fava, F., Renieri, A., Gao, P., Shen, X., Timpanaro, I. S., Kenna, K. P., Baillie, J. K., Davis, M. M., Tsao, P. S., Snyder, M. P. 2022

    Abstract

    The determinants of severe COVID-19 in healthy adults are poorly understood, which limits the opportunity for early intervention. We present a multiomic analysis using machine learning to characterize the genomic basis of COVID-19 severity. We use single-cell multiome profiling of human lungs to link genetic signals to cell-type-specific functions. We discover >1,000 risk genes across 19 cell types, which account for 77% of the SNP-based heritability for severe disease. Genetic risk is particularly focused within natural killer (NK) cells and T cells, placing the dysfunction of these cells upstream of severe disease. Mendelian randomization and single-cell profiling of human NK cells support the role of NK cells and further localize genetic risk to CD56bright NK cells, which are key cytokine producers during the innate immune response. Rare variant analysis confirms the enrichment of severe-disease-associated genetic variation within NK-cell risk genes. Our study provides insights into the pathogenesis of severe COVID-19 with potential therapeutic targets.

    View details for DOI 10.1016/j.cels.2022.05.007

    View details for PubMedID 35690068

  • Global, distinctive, and personal changes in molecular and microbial profiles by specific fibers in humans. Cell host & microbe Lancaster, S. M., Lee-McMullen, B., Abbott, C. W., Quijada, J. V., Hornburg, D., Park, H., Perelman, D., Peterson, D. J., Tang, M., Robinson, A., Ahadi, S., Contrepois, K., Hung, C., Ashland, M., McLaughlin, T., Boonyanit, A., Horning, A., Sonnenburg, J. L., Snyder, M. P. 2022

    Abstract

    Dietary fibers act through the microbiome to improve cardiovascular health and prevent metabolic disorders and cancer. To understand the health benefits of dietary fiber supplementation, we investigated two popular purified fibers, arabinoxylan (AX) and long-chain inulin (LCI), and a mixture of five fibers. We present multiomic signatures of metabolomics, lipidomics, proteomics, metagenomics, a cytokine panel, and clinical measurements on healthy and insulin-resistant participants. Each fiber is associated with fiber-dependent biochemical and microbial responses. AX consumption associates with a significant reduction in LDL and an increase in bile acids, contributing to its observed cholesterol reduction. LCI is associated with an increase in Bifidobacterium. However, at the highest LCI dose, there is increased inflammation and elevation in the liver enzyme alanine aminotransferase. This study yields insights into the effects of fiber supplementation and the mechanisms behind fiber-induced cholesterol reduction, and it shows effects of individual, purified fibers on the microbiome.

    View details for DOI 10.1016/j.chom.2022.03.036

    View details for PubMedID 35483363

  • Adverse childhood experiences, diabetes and associated conditions, preventive care practices and healthcare access: A population-based study. Preventive medicine Rose, S. M., Slavich, G. M., Snyder, M. P. 2022: 107044

    Abstract

    Our objective was to examine associations between Adverse Childhood Experiences (ACEs) and diabetes mellitus, including related conditions and preventive care practices. We used data from the Behavioral Risk Factor Surveillance System (BRFSS) 2009-2012, a cross-sectional, population-based survey, to assess ACEs, diabetes, and healthcare access in 179,375 adults. In those with diabetes (n = 21,007), we assessed the association of ACEs with myocardial infarction, stroke, and five Healthy People 2020 (HP2020) diabetes-related preventive-care objectives (n = 13,152). Healthcare access indicators included lack of a regular healthcare provider, insurance, and difficulty affording healthcare. Regression analyses adjusted for age, sex, and race. The adjusted odds ratio (AOR) of diabetes increased in a stepwise fashion by ACE exposure, ranging from 1.2 (95% CI 1.1-1.3) for 1 ACE to 1.7 (95% CI 1.6-1.9) for ≥4 ACEs, versus having no ACEs. In persons with diabetes, those with ≥4 ACEs had an elevated adjusted odds of myocardial infarction (AOR = 1.6, 95% CI 1.2-2.0) and stroke (AOR = 1.8, 95% CI 1.3-2.4), versus having no ACEs. ACEs were also associated with a reduction in the adjusted percent of HP2020 diabetes objectives met: 72.9% (95% CI 71.3-74.5) for those with no ACEs versus only 66.5% (95% CI 63.8-69.3%) for those with ≥4 ACEs (p = 0.0002). Finally, ACEs predicted worse healthcare access in a stepwise fashion for all indicators. In conclusion, ACEs are associated with greater prevalence of diabetes and associated conditions, and with meeting fewer HP2020 prevention goals. ACEs screening and trauma-informed care practices are thus recommended.

    View details for DOI 10.1016/j.ypmed.2022.107044

    View details for PubMedID 35398366

  • Genome-wide identification of the genetic basis of amyotrophic lateral sclerosis. Neuron Zhang, S., Cooper-Knock, J., Weimer, A. K., Shi, M., Moll, T., Marshall, J. N., Harvey, C., Nezhad, H. G., Franklin, J., Souza, C. D., Ning, K., Wang, C., Li, J., Dilliott, A. A., Farhan, S., Elhaik, E., Pasniceanu, I., Livesey, M. R., Eitan, C., Hornstein, E., Kenna, K. P., Project MinE ALS Sequencing Consortium, Veldink, J. H., Ferraiuolo, L., Shaw, P. J., Snyder, M. P., Blair, I., Wray, N. R., Kiernan, M., Mitne Neto, M., Chio, A., Cauchi, R., Robberecht, W., van Damme, P., Corcia, P., Couratier, P., Hardiman, O., McLaughin, R., Gotkine, M., Drory, V., Ticozzi, N., Silani, V., Veldink, J. H., van den Berg, L. H., de Carvalho, M., Mora Pardina, J. S., Povedano, M., Andersen, P., Weber, M., Basak, N. A., Al-Chalabi, A., Shaw, C., Shaw, P. J., Morrison, K. E., Landers, J. E., Glass, J. D. 1800

    Abstract

    Amyotrophic lateral sclerosis (ALS) is a complex disease that leads to motor neuron death. Despite heritability estimates of 52%, genome-wide association studies (GWASs) have discovered relatively few loci. We developed a machine learning approach called RefMap, which integrates functional genomics with GWAS summary statistics for gene discovery. With transcriptomic and epigenetic profiling of motor neurons derived from induced pluripotent stem cells (iPSCs), RefMap identified 690 ALS-associated genes that represent a 5-fold increase in recovered heritability. Extensive conservation, transcriptome, network, and rare variant analyses demonstrated the functional significance of candidate genes in healthy and diseased motor neurons and brain tissues. Genetic convergence between common and rare variation highlighted KANK1 as a new ALS gene. Reproducing KANK1 patient mutations in human neurons led to neurotoxicity and demonstrated that TDP-43 mislocalization, a hallmark pathology of ALS, is downstream of axonal dysfunction. RefMap can be readily applied to other complex diseases.

    View details for DOI 10.1016/j.neuron.2021.12.019

    View details for PubMedID 35045337

  • Phenotypic characteristics of peripheral immune cells of Myalgic encephalomyelitis/chronic fatigue syndrome via transmission electron microscopy: A pilot study. PloS one Jahanbani, F., Maynard, R. D., Sing, J. C., Jahanbani, S., Perrino, J. J., Spacek, D. V., Davis, R. W., Snyder, M. P. 2022; 17 (8): e0272703

    Abstract

    Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a complex chronic multi-systemic disease characterized by extreme fatigue that is not improved by rest, and worsens after exertion, whether physical or mental. Previous studies have shown ME/CFS-associated alterations in the immune system and mitochondria. We used transmission electron microscopy (TEM) to investigate the morphology and ultrastructure of unstimulated and stimulated ME/CFS immune cells and their intracellular organelles, including mitochondria. PBMCs from four participants were studied: a pair of identical twins discordant for moderate ME/CFS, as well as two age- and gender- matched unrelated subjects-one with an extremely severe form of ME/CFS and the other healthy. TEM analysis of CD3/CD28-stimulated T cells suggested a significant increase in the levels of apoptotic and necrotic cell death in T cells from ME/CFS patients (over 2-fold). Stimulated Tcells of ME/CFS patients also had higher numbers of swollen mitochondria. We also found a large increase in intracellular giant lipid droplet-like organelles in the stimulated PBMCs from the extremely severe ME/CFS patient potentially indicative of a lipid storage disorder. Lastly, we observed a slight increase in platelet aggregation in stimulated cells, suggestive of a possible role of platelet activity in ME/CFS pathophysiology and disease severity. These results indicate extensive morphological alterations in the cellular and mitochondrial phenotypes of ME/CFS patients' immune cells and suggest new insights into ME/CFS biology.

    View details for DOI 10.1371/journal.pone.0272703

    View details for PubMedID 35943990

  • Real-time alerting system for COVID-19 and other stress events using wearable data. Nature medicine Alavi, A., Bogu, G. K., Wang, M., Rangan, E. S., Brooks, A. W., Wang, Q., Higgs, E., Celli, A., Mishra, T., Metwally, A. A., Cha, K., Knowles, P., Alavi, A. A., Bhasin, R., Panchamukhi, S., Celis, D., Aditya, T., Honkala, A., Rolnik, B., Hunting, E., Dagan-Rosenfeld, O., Chauhan, A., Li, J. W., Bejikian, C., Krishnan, V., McGuire, L., Li, X., Bahmani, A., Snyder, M. P. 2021

    Abstract

    Early detection of infectious diseases is crucial for reducing transmission and facilitating early intervention. In this study, we built a real-time smartwatch-based alerting system that detects aberrant physiological and activity signals (heart rates and steps) associated with the onset of early infection and implemented this system in a prospective study. In a cohort of 3,318 participants, of whom 84 were infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), this system generated alerts for pre-symptomatic and asymptomatic SARS-CoV-2 infection in 67 (80%) of the infected individuals. Pre-symptomatic signals were observed at a median of 3 days before symptom onset. Examination of detailed survey responses provided by the participants revealed that other respiratory infections as well as events not associated with infection, such as stress, alcohol consumption and travel, could also trigger alerts, albeit at a much lower mean frequency (1.15 alert days per person compared to 3.42 alert days per person for coronavirus disease 2019 cases). Thus, analysis of smartwatch signals by an online detection algorithm provides advance warning of SARS-CoV-2 infection in a high percentage of cases. This study shows that a real-time alerting system can be used for early detection of infection and other stressors and employed on an open-source platform that is scalable to millions of users.

    View details for DOI 10.1038/s41591-021-01593-2

    View details for PubMedID 34845389

  • A scalable, secure, and interoperable platform for deep data-driven health management. Nature communications Bahmani, A., Alavi, A., Buergel, T., Upadhyayula, S., Wang, Q., Ananthakrishnan, S. K., Alavi, A., Celis, D., Gillespie, D., Young, G., Xing, Z., Nguyen, M. H., Haque, A., Mathur, A., Payne, J., Mazaheri, G., Li, J. K., Kotipalli, P., Liao, L., Bhasin, R., Cha, K., Rolnik, B., Celli, A., Dagan-Rosenfeld, O., Higgs, E., Zhou, W., Berry, C. L., Van Winkle, K. G., Contrepois, K., Ray, U., Bettinger, K., Datta, S., Li, X., Snyder, M. P. 2021; 12 (1): 5757

    Abstract

    The large amount of biomedical data derived from wearable sensors, electronic health records, and molecular profiling (e.g., genomics data) is rapidly transforming our healthcare systems. The increasing scale and scope of biomedical data not only is generating enormous opportunities for improving health outcomes but also raises new challenges ranging from data acquisition and storage to data analysis and utilization. To meet these challenges, we developed the Personal Health Dashboard (PHD), which utilizes state-of-the-art security and scalability technologies to provide an end-to-end solution for big biomedical data analytics. The PHD platform is an open-source software framework that can be easily configured and deployed to any big data health project to store, organize, and process complex biomedical data sets, support real-time data analysis at both the individual level and the cohort level, and ensure participant privacy at every step. In addition to presenting the system, we illustrate the use of the PHD framework for large-scale applications in emerging multi-omics disease studies, such as collecting and visualization of diverse data types (wearable, clinical, omics) at a personal level, investigation of insulin resistance, and an infrastructure for the detection of presymptomatic COVID-19.

    View details for DOI 10.1038/s41467-021-26040-1

    View details for PubMedID 34599181

  • Chromatin accessibility associates with protein-RNA correlation in human cancer. Nature communications Sanghi, A., Gruber, J. J., Metwally, A., Jiang, L., Reynolds, W., Sunwoo, J., Orloff, L., Chang, H. Y., Kasowski, M., Snyder, M. P. 2021; 12 (1): 5732

    Abstract

    Although alterations in chromatin structure are known to exist in tumors, how these alterations relate to molecular phenotypes in cancer remains to be demonstrated. Multi-omics profiling of human tumors can provide insight into how alterations in chromatin structure are propagated through the pathway of gene expression to result in malignant protein expression. We applied multi-omics profiling of chromatin accessibility, RNA abundance, and protein abundance to 36 human thyroid cancer primary tumors, metastases, and patient-match normal tissue. Through quantification of chromatin accessibility associated with active transcription units and global protein expression, we identify a local chromatin structure that is highly correlated with coordinated RNA and protein expression. In particular, we identify enhancers located within gene-bodies as predictive of correlated RNA and protein expression, that is independent of overall transcriptional activity. To demonstrate the generalizability of these findings we also identify similar results in an independent cohort of human breast cancers. Taken together, these analyses suggest that local enhancers, rather than distal enhancers, are likely most predictive of cancer gene expression phenotypes. This allows for identification of potential targets for cancer therapeutic approaches and reinforces the utility of multi-omics profiling as a methodology to understand human disease.

    View details for DOI 10.1038/s41467-021-25872-1

    View details for PubMedID 34593797

  • metID: a R package for automatable compound annotation for LC-MS-based data. Bioinformatics (Oxford, England) Shen, X., Wu, S., Liang, L., Chen, S., Contrepois, K., Zhu, Z., Snyder, M. 2021

    Abstract

    SUMMARY: Accurate and efficient compound annotation is a long-standing challenge for LC-MS-based data (e.g., untargeted metabolomics and exposomics). Substantial efforts have been devoted to overcoming this obstacle, whereas current tools are limited by the sources of spectral information used (in-house and public databases) and are not automated and streamlined. Therefore, we developed metID, an R package that combines information from all major databases for comprehensive and streamlined compound annotation. metID is a flexible, simple, and powerful tool that can be installed on all platforms, allowing the compound annotation process to be fully automatic and reproducible. A detailed tutorial and a case study are provided in Supplementary Materials.AVAILABILITY AND IMPLEMENTATION: https://jaspershen.github.io/metID.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btab583

    View details for PubMedID 34432001

  • Five-year pediatric use of a digital wearable fitness device: lessons from a pilot case study. JAMIA open Butte, K. D., Bahmani, A., Butte, A. J., Li, X., Snyder, M. P. 2021; 4 (3): ooab054

    Abstract

    Objectives: Wearable fitness devices are increasingly being used by the general population, with many new applications being proposed for healthy adults as well as for adults with chronic diseases. Fewer, if any, studies of these devices have been conducted in healthy adolescents and teenagers, especially over a long period of time. The goal of this work was to document the successes and challenges involved in 5 years of a wearable fitness device use in a pediatric case study.Materials and methods: Comparison of 5 years of step counts and minutes asleep from a teenaged girl and her father.Results: At 60 months, this may be the longest reported pediatric study involving a wearable fitness device, and the first simultaneously involving a parent and a child. We find step counts to be significantly higher for both the adult and teen on school/work days, along with less sleep. The teen walked significantly less towards the end of the 5-year study. Surprisingly, many of the adult's and teen's sleeping and step counts were correlated, possibly due to coordinated behaviors.Discussion: We end with several recommendations for pediatricians and device manufacturers, including the need for constant adjustments of stride length and calorie counts as teens are growing.Conclusion: With periodic adjustments for growth, this pilot study shows these devices can be used for more accurate and consistent measurements in adolescents and teenagers over longer periods of time, to potentially promote healthy behaviors.

    View details for DOI 10.1093/jamiaopen/ooab054

    View details for PubMedID 34350390

  • Wearable sensors enable personalized predictions of clinical laboratory measurements. Nature medicine Dunn, J., Kidzinski, L., Runge, R., Witt, D., Hicks, J. L., Schussler-Fiorenza Rose, S. M., Li, X., Bahmani, A., Delp, S. L., Hastie, T., Snyder, M. P. 2021

    Abstract

    Vital signs, including heart rate and body temperature, are useful in detecting or monitoring medical conditions, but are typically measured in the clinic and require follow-up laboratory testing for more definitive diagnoses. Here we examined whether vital signs as measured by consumer wearable devices (that is, continuously monitored heart rate, body temperature, electrodermal activity and movement) can predict clinical laboratory test results using machine learning models, including random forest and Lasso models. Our results demonstrate that vital sign data collected from wearables give a more consistent and precise depiction of resting heart rate than do measurements taken in the clinic. Vital sign data collected from wearables can also predict several clinical laboratory measurements with lower prediction error than predictions made using clinically obtained vital sign measurements. The length of time over which vital signs are monitored and the proximity of the monitoring period to the date of prediction play a critical role in the performance of the machine learning models. These results demonstrate the value of commercial wearable devices for continuous and longitudinal assessment of physiological measurements that today can be measured only with clinical laboratory tests.

    View details for DOI 10.1038/s41591-021-01339-0

    View details for PubMedID 34031607

  • Pre-symptomatic detection of COVID-19 from smartwatch data. Nature biomedical engineering Mishra, T., Wang, M., Metwally, A. A., Bogu, G. K., Brooks, A. W., Bahmani, A., Alavi, A., Celli, A., Higgs, E., Dagan-Rosenfeld, O., Fay, B., Kirkpatrick, S., Kellogg, R., Gibson, M., Wang, T., Hunting, E. M., Mamic, P., Ganz, A. B., Rolnik, B., Li, X., Snyder, M. P. 2020

    Abstract

    Consumer wearable devices that continuously measure vital signs have been used to monitor the onset of infectious disease. Here, we show that data from consumer smartwatches can be used for the pre-symptomatic detection of coronavirus disease 2019 (COVID-19). We analysed physiological and activity data from 32 individuals infected with COVID-19, identified from a cohort of nearly 5,300 participants, and found that 26 of them (81%) had alterations in their heart rate, number of daily steps or time asleep. Of the 25 cases of COVID-19 with detected physiological alterations for which we had symptom information, 22 were detected before (or at) symptom onset, with four cases detected at least nine days earlier. Using retrospective smartwatch data, we show that 63% of the COVID-19 cases could have been detected before symptom onset in real time via a two-tiered warning system based on the occurrence of extreme elevations in resting heart rate relative to the individual baseline. Our findings suggest that activity tracking and health monitoring via consumer wearable devices may be used for the large-scale, real-time detection of respiratory infections, often pre-symptomatically.

    View details for DOI 10.1038/s41551-020-00640-6

    View details for PubMedID 33208926

  • An integrative ENCODE resource for cancer genomics. Nature communications Zhang, J., Lee, D., Dhiman, V., Jiang, P., Xu, J., McGillivray, P., Yang, H., Liu, J., Meyerson, W., Clarke, D., Gu, M., Li, S., Lou, S., Xu, J., Lochovsky, L., Ung, M., Ma, L., Yu, S., Cao, Q., Harmanci, A., Yan, K., Sethi, A., Gursoy, G., Schoenberg, M. R., Rozowsky, J., Warrell, J., Emani, P., Yang, Y. T., Galeev, T., Kong, X., Liu, S., Li, X., Krishnan, J., Feng, Y., Rivera-Mulia, J. C., Adrian, J., Broach, J. R., Bolt, M., Moran, J., Fitzgerald, D., Dileep, V., Liu, T., Mei, S., Sasaki, T., Trevilla-Garcia, C., Wang, S., Wang, Y., Zang, C., Wang, D., Klein, R. J., Snyder, M., Gilbert, D. M., Yip, K., Cheng, C., Yue, F., Liu, X. S., White, K. P., Gerstein, M. 2020; 11 (1): 3696

    Abstract

    ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.

    View details for DOI 10.1038/s41467-020-14743-w

    View details for PubMedID 32728046

  • Perspectives on ENCODE. Nature ENCODE Project Consortium, Snyder, M. P., Gingeras, T. R., Moore, J. E., Weng, Z., Gerstein, M. B., Ren, B., Hardison, R. C., Stamatoyannopoulos, J. A., Graveley, B. R., Feingold, E. A., Pazin, M. J., Pagan, M., Gilchrist, D. A., Hitz, B. C., Cherry, J. M., Bernstein, B. E., Mendenhall, E. M., Zerbino, D. R., Frankish, A., Flicek, P., Myers, R. M., Abascal, F., Acosta, R., Addleman, N. J., Adrian, J., Afzal, V., Aken, B., Akiyama, J. A., Jammal, O. A., Amrhein, H., Anderson, S. M., Andrews, G. R., Antoshechkin, I., Ardlie, K. G., Armstrong, J., Astley, M., Banerjee, B., Barkal, A. A., Barnes, I. H., Barozzi, I., Barrell, D., Barson, G., Bates, D., Baymuradov, U. K., Bazile, C., Beer, M. A., Beik, S., Bender, M. A., Bennett, R., Bouvrette, L. P., Bernstein, B. E., Berry, A., Bhaskar, A., Bignell, A., Blue, S. M., Bodine, D. M., Boix, C., Boley, N., Borrman, T., Borsari, B., Boyle, A. P., Brandsmeier, L. A., Breschi, A., Bresnick, E. H., Brooks, J. A., Buckley, M., Burge, C. B., Byron, R., Cahill, E., Cai, L., Cao, L., Carty, M., Castanon, R. G., Castillo, A., Chaib, H., Chan, E. T., Chee, D. R., Chee, S., Chen, H., Chen, H., Chen, J., Chen, S., Cherry, J. M., Chhetri, S. B., Choudhary, J. S., Chrast, J., Chung, D., Clarke, D., Cody, N. A., Coppola, C. J., Coursen, J., D'Ippolito, A. M., Dalton, S., Danyko, C., Davidson, C., Davila-Velderrain, J., Davis, C. A., Dekker, J., Deran, A., DeSalvo, G., Despacio-Reyes, G., Dewey, C. N., Dickel, D. E., Diegel, M., Diekhans, M., Dileep, V., Ding, B., Djebali, S., Dobin, A., Dominguez, D., Donaldson, S., Drenkow, J., Dreszer, T. R., Drier, Y., Duff, M. O., Dunn, D., Eastman, C., Ecker, J. R., Edwards, M. D., El-Ali, N., Elhajjajy, S. I., Elkins, K., Emili, A., Epstein, C. B., Evans, R. C., Ezkurdia, I., Fan, K., Farnham, P. J., Farrell, N., Feingold, E. A., Ferreira, A., Fisher-Aylor, K., Fitzgerald, S., Flicek, P., Foo, C. S., Fortier, K., Frankish, A., Freese, P., Fu, S., Fu, X., Fu, Y., Fukuda-Yuzawa, Y., Fulciniti, M., Funnell, A. P., Gabdank, I., Galeev, T., Gao, M., Giron, C. G., Garvin, T. H., Gelboin-Burkhart, C. A., Georgolopoulos, G., Gerstein, M. B., Giardine, B. M., Gifford, D. K., Gilbert, D. M., Gilchrist, D. A., Gillespie, S., Gingeras, T. R., Gong, P., Gonzalez, A., Gonzalez, J. M., Good, P., Goren, A., Gorkin, D. U., Graveley, B. R., Gray, M., Greenblatt, J. F., Griffiths, E., Groudine, M. T., Grubert, F., Gu, M., Guigo, R., Guo, H., Guo, Y., Guo, Y., Gursoy, G., Gutierrez-Arcelus, M., Halow, J., Hardison, R. C., Hardy, M., Hariharan, M., Harmanci, A., Harrington, A., Harrow, J. L., Hashimoto, T. B., Hasz, R. D., Hatan, M., Haugen, E., Hayes, J. E., He, P., He, Y., Heidari, N., Hendrickson, D., Heuston, E. F., Hilton, J. A., Hitz, B. C., Hochman, A., Holgren, C., Hou, L., Hou, S., Hsiao, Y. E., Hsu, S., Huang, H., Hubbard, T. J., Huey, J., Hughes, T. R., Hunt, T., Ibarrientos, S., Issner, R., Iwata, M., Izuogu, O., Jaakkola, T., Jameel, N., Jansen, C., Jiang, L., Jiang, P., Johnson, A., Johnson, R., Jungreis, I., Kadaba, M., Kasowski, M., Kasparian, M., Kato, M., Kaul, R., Kawli, T., Kay, M., Keen, J. C., Keles, S., Keller, C. A., Kelley, D., Kellis, M., Kheradpour, P., Kim, D. S., Kirilusha, A., Klein, R. J., Knoechel, B., Kuan, S., Kulik, M. J., Kumar, S., Kundaje, A., Kutyavin, T., Lagarde, J., Lajoie, B. R., Lambert, N. J., Lazar, J., Lee, A. Y., Lee, D., Lee, E., Lee, J. W., Lee, K., Leslie, C. S., Levy, S., Li, B., Li, H., Li, N., Li, X., Li, Y. I., Li, Y., Li, Y., Li, Y., Lian, J., Libbrecht, M. W., Lin, S., Lin, Y., Liu, D., Liu, J., Liu, P., Liu, T., Liu, X. S., Liu, Y., Liu, Y., Long, M., Lou, S., Loveland, J., Lu, A., Lu, Y., Lecuyer, E., Ma, L., Mackiewicz, M., Mannion, B. J., Mannstadt, M., Manthravadi, D., Marinov, G. K., Martin, F. J., Mattei, E., McCue, K., McEown, M., McVicker, G., Meadows, S. K., Meissner, A., Mendenhall, E. M., Messer, C. L., Meuleman, W., Meyer, C., Miller, S., Milton, M. G., Mishra, T., Moore, D. E., Moore, H. M., Moore, J. E., Moore, S. H., Moran, J., Mortazavi, A., Mudge, J. M., Munshi, N., Murad, R., Myers, R. M., Nandakumar, V., Nandi, P., Narasimha, A. M., Narayanan, A. K., Naughton, H., Navarro, F. C., Navas, P., Nazarovs, J., Nelson, J., Neph, S., Neri, F. J., Nery, J. R., Nesmith, A. R., Newberry, J. S., Newberry, K. M., Ngo, V., Nguyen, R., Nguyen, T. B., Nguyen, T., Nishida, A., Noble, W. S., Novak, C. S., Novoa, E. M., Nunez, B., O'Donnell, C. W., Olson, S., Onate, K. C., Otterman, E., Ozadam, H., Pagan, M., Palden, T., Pan, X., Park, Y., Partridge, E. C., Paten, B., Pauli-Behn, F., Pazin, M. J., Pei, B., Pennacchio, L. A., Perez, A. R., Perry, E. H., Pervouchine, D. D., Phalke, N. N., Pham, Q., Phanstiel, D. H., Plajzer-Frick, I., Pratt, G. A., Pratt, H. E., Preissl, S., Pritchard, J. K., Pritykin, Y., Purcaro, M. J., Qin, Q., Quinones-Valdez, G., Rabano, I., Radovani, E., Raj, A., Rajagopal, N., Ram, O., Ramirez, L., Ramirez, R. N., Rausch, D., Raychaudhuri, S., Raymond, J., Razavi, R., Reddy, T. E., Reimonn, T. M., Ren, B., Reymond, A., Reynolds, A., Rhie, S. K., Rinn, J., Rivera, M., Rivera-Mulia, J. C., Roberts, B., Rodriguez, J. M., Rozowsky, J., Ryan, R., Rynes, E., Salins, D. N., Sandstrom, R., Sasaki, T., Sathe, S., Savic, D., Scavelli, A., Scheiman, J., Schlaffner, C., Schloss, J. A., Schmitges, F. W., See, L. H., Sethi, A., Setty, M., Shafer, A., Shan, S., Sharon, E., Shen, Q., Shen, Y., Sherwood, R. I., Shi, M., Shin, S., Shoresh, N., Siebenthall, K., Sisu, C., Slifer, T., Sloan, C. A., Smith, A., Snetkova, V., Snyder, M. P., Spacek, D. V., Srinivasan, S., Srivas, R., Stamatoyannopoulos, G., Stamatoyannopoulos, J. A., Stanton, R., Steffan, D., Stehling-Sun, S., Strattan, J. S., Su, A., Sundararaman, B., Suner, M., Syed, T., Szynkarek, M., Tanaka, F. Y., Tenen, D., Teng, M., Thomas, J. A., Toffey, D., Tress, M. L., Trout, D. E., Trynka, G., Tsuji, J., Upchurch, S. A., Ursu, O., Uszczynska-Ratajczak, B., Uziel, M. C., Valencia, A., Biber, B. V., van der Velde, A. G., Van Nostrand, E. L., Vaydylevich, Y., Vazquez, J., Victorsen, A., Vielmetter, J., Vierstra, J., Visel, A., Vlasova, A., Vockley, C. M., Volpi, S., Vong, S., Wang, H., Wang, M., Wang, Q., Wang, R., Wang, T., Wang, W., Wang, X., Wang, Y., Watson, N. K., Wei, X., Wei, Z., Weisser, H., Weissman, S. M., Welch, R., Welikson, R. E., Weng, Z., Westra, H., Whitaker, J. W., White, C., White, K. P., Wildberg, A., Williams, B. A., Wine, D., Witt, H. N., Wold, B., Wolf, M., Wright, J., Xiao, R., Xiao, X., Xu, J., Xu, J., Yan, K., Yan, Y., Yang, H., Yang, X., Yang, Y., Yardimci, G. G., Yee, B. A., Yeo, G. W., Young, T., Yu, T., Yue, F., Zaleski, C., Zang, C., Zeng, H., Zeng, W., Zerbino, D. R., Zhai, J., Zhan, L., Zhan, Y., Zhang, B., Zhang, J., Zhang, J., Zhang, K., Zhang, L., Zhang, P., Zhang, Q., Zhang, X., Zhang, Y., Zhang, Z., Zhao, Y., Zheng, Y., Zhong, G., Zhou, X., Zhu, Y., Zimmerman, J. 2020; 583 (7818): 693–98

    Abstract

    The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.

    View details for DOI 10.1038/s41586-020-2449-8

    View details for PubMedID 32728248

  • Metabolic Dynamics and Prediction of Gestational Age and Time to Delivery in Pregnant Women. Cell Liang, L., Rasmussen, M. H., Piening, B., Shen, X., Chen, S., Rost, H., Snyder, J. K., Tibshirani, R., Skotte, L., Lee, N. C., Contrepois, K., Feenstra, B., Zackriah, H., Snyder, M., Melbye, M. 2020; 181 (7): 1680

    Abstract

    Metabolism during pregnancy is a dynamic and precisely programmed process, the failure of which can bring devastating consequences to the mother and fetus. To define a high-resolution temporal profile of metabolites during healthy pregnancy, we analyzed the untargeted metabolome of 784weekly blood samples from 30 pregnant women. Broad changes and a highly choreographed profile were revealed: 4,995 metabolic features (of 9,651 total), 460 annotated compounds (of 687 total), and 34 human metabolic pathways (of 48 total) were significantly changed during pregnancy. Using linear models, we built a metabolic clock with five metabolites that time gestational age in high accordance with ultrasound (R= 0.92). Furthermore, two to three metabolites can identify when labor occurs (time to delivery within two, four, and eight weeks, AUROC ≥ 0.85). Our study represents a weekly characterization of the human pregnancy metabolome, providing a high-resolution landscape for understanding pregnancy with potential clinical utilities.

    View details for DOI 10.1016/j.cell.2020.05.002

    View details for PubMedID 32589958

  • Systematic identification of silencers in human cells. Nature genetics Pang, B., Snyder, M. P. 2020

    Abstract

    The majority of the human genome does not encode proteins. Many of these noncoding regions contain important regulatory sequences that control gene expression. To date, most studies have focused on activators such as enhancers, but regions that repress gene expression-silencers-have not been systematically studied. We have developed a system that identifies silencer regions in a genome-wide fashion on the basis of silencer-mediated transcriptional repression of caspase9. We found that silencers are widely distributed and may function in a tissue-specific fashion. These silencers harbor unique epigenetic signatures and are associated with specific transcription factors. Silencers also act at multiple genes, and at the level of chromosomal domains and long-range interactions. Deletion of silencer regions linked to the drug transporter genes ABCC2 and ABCG2 caused chemo-resistance. Overall, our study demonstrates that tissue-specific silencing is widespread throughout the human genome and probably contributes substantially to the regulation of gene expression and human biology.

    View details for DOI 10.1038/s41588-020-0578-5

    View details for PubMedID 32094911

  • Deep Characterization of the Human Antibody Response to Natural Infection Using Longitudinal Immune Repertoire Sequencing. Molecular & cellular proteomics : MCP Mitsunaga, E. M., Snyder, M. P. 2020; 19 (2): 278-293

    Abstract

    Human antibody response studies are largely restricted to periods of high immune activity (e.g. vaccination). To comprehensively understand the healthy B cell immune repertoire and how this changes over time and through natural infection, we conducted immune repertoire RNA sequencing on flow cytometry-sorted B cell subsets to profile a single individual's antibodies over 11 months through two periods of natural viral infection. We found that 1) a baseline of healthy variable (V) gene usage in antibodies exists and is stable over time, but antibodies in memory cells consistently have a different usage profile relative to earlier B cell stages; 2) a single complementarity-determining region 3 (CDR3) is potentially generated from more than one VJ gene combination; and 3) IgG and IgA antibody transcripts are found at low levels in early human B cell development, suggesting that class switching may occur earlier than previously realized. These findings provide insight into immune repertoire stability, response to natural infections, and human B cell development.

    View details for DOI 10.1074/mcp.RA119.001633

    View details for PubMedID 33451388

  • Molecular Choreography of Acute Exercise. Cell Contrepois, K. n., Wu, S. n., Moneghetti, K. J., Hornburg, D. n., Ahadi, S. n., Tsai, M. S., Metwally, A. A., Wei, E. n., Lee-McMullen, B. n., Quijada, J. V., Chen, S. n., Christle, J. W., Ellenberger, M. n., Balliu, B. n., Taylor, S. n., Durrant, M. G., Knowles, D. A., Choudhry, H. n., Ashland, M. n., Bahmani, A. n., Enslen, B. n., Amsallem, M. n., Kobayashi, Y. n., Avina, M. n., Perelman, D. n., Schüssler-Fiorenza Rose, S. M., Zhou, W. n., Ashley, E. A., Montgomery, S. B., Chaib, H. n., Haddad, F. n., Snyder, M. P. 2020; 181 (5): 1112–30.e16

    Abstract

    Acute physical activity leads to several changes in metabolic, cardiovascular, and immune pathways. Although studies have examined selected changes in these pathways, the system-wide molecular response to an acute bout of exercise has not been fully characterized. We performed longitudinal multi-omic profiling of plasma and peripheral blood mononuclear cells including metabolome, lipidome, immunome, proteome, and transcriptome from 36 well-characterized volunteers, before and after a controlled bout of symptom-limited exercise. Time-series analysis revealed thousands of molecular changes and an orchestrated choreography of biological processes involving energy metabolism, oxidative stress, inflammation, tissue repair, and growth factor response, as well as regulatory pathways. Most of these processes were dampened and some were reversed in insulin-resistant participants. Finally, we discovered biological pathways involved in cardiopulmonary exercise response and developed prediction models revealing potential resting blood-based biomarkers of peak oxygen consumption.

    View details for DOI 10.1016/j.cell.2020.04.043

    View details for PubMedID 32470399

  • Personal aging markers and ageotypes revealed by deep longitudinal profiling. Nature medicine Ahadi, S., Zhou, W., Schussler-Fiorenza Rose, S. M., Sailani, M. R., Contrepois, K., Avina, M., Ashland, M., Brunet, A., Snyder, M. 2020; 26 (1): 83–90

    Abstract

    The molecular changes that occur with aging are not well understood1-4. Here, we performed longitudinal and deep multiomics profiling of 106 healthy individuals from 29 to 75 years of age and examined how different types of 'omic' measurements, including transcripts, proteins, metabolites, cytokines, microbes and clinical laboratory values, correlate with age. We identified both known and new markers that associated with age, as well as distinct molecular patterns of aging in insulin-resistant as compared to insulin-sensitive individuals. In a longitudinal setting, we identified personal aging markers whose levels changed over a short time frame of 2-3 years. Further, we defined different types of aging patterns in different individuals, termed 'ageotypes', on the basis of the types of molecular pathways that changed over time in a given individual. Ageotypes may provide a molecular assessment of personal aging, reflective of personal lifestyle and medical history, that may ultimately be useful in monitoring and intervening in the aging process.

    View details for DOI 10.1038/s41591-019-0719-5

    View details for PubMedID 31932806

  • A Quantitative Proteome Map of the Human Body. Cell Jiang, L. n., Wang, M. n., Lin, S. n., Jian, R. n., Li, X. n., Chan, J. n., Dong, G. n., Fang, H. n., Robinson, A. E., Snyder, M. P. 2020

    Abstract

    Determining protein levels in each tissue and how they compare with RNA levels is important for understanding human biology and disease as well as regulatory processes that control protein levels. We quantified the relative protein levels from over 12,000 genes across 32 normal human tissues. Tissue-specific or tissue-enriched proteins were identified and compared to transcriptome data. Many ubiquitous transcripts are found to encode tissue-specific proteins. Discordance of RNA and protein enrichment revealed potential sites of synthesis and action of secreted proteins. The tissue-specific distribution of proteins also provides an in-depth view of complex biological events that require the interplay of multiple tissues. Most importantly, our study demonstrated that protein tissue-enrichment information can explain phenotypes of genetic diseases, which cannot be obtained by transcript information alone. Overall, our results demonstrate how understanding protein levels can provide insights into regulation, secretome, metabolism, and human diseases.

    View details for DOI 10.1016/j.cell.2020.08.036

    View details for PubMedID 32916130

  • Candidate variants in TUB are associated with familial tremor. PLoS genetics Sailani, M. R., Jahanbani, F. n., Abbott, C. W., Lee, H. n., Zia, A. n., Rego, S. n., Winkelmann, J. n., Hopfner, F. n., Khan, T. N., Katsanis, N. n., Müller, S. H., Berg, D. n., Lyman, K. M., Mychajliw, C. n., Deuschl, G. n., Bernstein, J. A., Kuhlenbäumer, G. n., Snyder, M. P. 2020; 16 (9): e1009010

    Abstract

    Essential tremor (ET) is the most common adult-onset movement disorder. In the present study, we performed whole exome sequencing of a large ET-affected family (10 affected and 6 un-affected family members) and identified a TUB p.V431I variant (rs75594955) segregating in a manner consistent with autosomal-dominant inheritance. Subsequent targeted re-sequencing of TUB in 820 unrelated individuals with sporadic ET and 630 controls revealed significant enrichment of rare nonsynonymous TUB variants (e.g. rs75594955: p.V431I, rs1241709665: p.Ile20Phe, rs55648406: p.Arg49Gln) in the ET cohort (SKAT-O test p-value = 6.20e-08). TUB encodes a transcription factor predominantly expressed in neuronal cells and has been previously implicated in obesity. ChIP-seq analyses of the TUB transcription factor across different regions of the mouse brain revealed that TUB regulates the pathways responsible for neurotransmitter production as well thyroid hormone signaling. Together, these results support the association of rare variants in TUB with ET.

    View details for DOI 10.1371/journal.pgen.1009010

    View details for PubMedID 32956375

  • Landscape of cohesin-mediated chromatin loops in the human genome. Nature Grubert, F. n., Srivas, R. n., Spacek, D. V., Kasowski, M. n., Ruiz-Velasco, M. n., Sinnott-Armstrong, N. n., Greenside, P. n., Narasimha, A. n., Liu, Q. n., Geller, B. n., Sanghi, A. n., Kulik, M. n., Sa, S. n., Rabinovitch, M. n., Kundaje, A. n., Dalton, S. n., Zaugg, J. B., Snyder, M. n. 2020; 583 (7818): 737–43

    Abstract

    Physical interactions between distal regulatory elements have a key role in regulating gene expression, but the extent to which these interactions vary between cell types and contribute to cell-type-specific gene expression remains unclear. Here, to address these questions as part of phase III of the Encyclopedia of DNA Elements (ENCODE), we mapped cohesin-mediated chromatin loops, using chromatin interaction analysis by paired-end tag sequencing (ChIA-PET), and analysed gene expression in 24 diverse human cell types, including core ENCODE cell lines. Twenty-eight per cent of all chromatin loops vary across cell types; these variations modestly correlate with changes in gene expression and are effective at grouping cell types according to their tissue of origin. The connectivity of genes corresponds to different functional classes, with housekeeping genes having few contacts, and dosage-sensitive genes being more connected to enhancer elements. This atlas of chromatin loops complements the diverse maps of regulatory architecture that comprise the ENCODE Encyclopedia, and will help to support emerging analyses of genome structure and function.

    View details for DOI 10.1038/s41586-020-2151-x

    View details for PubMedID 32728247

  • Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature Moore, J. E., Purcaro, M. J., Pratt, H. E., Epstein, C. B., Shoresh, N. n., Adrian, J. n., Kawli, T. n., Davis, C. A., Dobin, A. n., Kaul, R. n., Halow, J. n., Van Nostrand, E. L., Freese, P. n., Gorkin, D. U., Shen, Y. n., He, Y. n., Mackiewicz, M. n., Pauli-Behn, F. n., Williams, B. A., Mortazavi, A. n., Keller, C. A., Zhang, X. O., Elhajjajy, S. I., Huey, J. n., Dickel, D. E., Snetkova, V. n., Wei, X. n., Wang, X. n., Rivera-Mulia, J. C., Rozowsky, J. n., Zhang, J. n., Chhetri, S. B., Zhang, J. n., Victorsen, A. n., White, K. P., Visel, A. n., Yeo, G. W., Burge, C. B., Lécuyer, E. n., Gilbert, D. M., Dekker, J. n., Rinn, J. n., Mendenhall, E. M., Ecker, J. R., Kellis, M. n., Klein, R. J., Noble, W. S., Kundaje, A. n., Guigó, R. n., Farnham, P. J., Cherry, J. M., Myers, R. M., Ren, B. n., Graveley, B. R., Gerstein, M. B., Pennacchio, L. A., Snyder, M. P., Bernstein, B. E., Wold, B. n., Hardison, R. C., Gingeras, T. R., Stamatoyannopoulos, J. A., Weng, Z. n. 2020; 583 (7818): 699–710

    Abstract

    The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.

    View details for DOI 10.1038/s41586-020-2493-4

    View details for PubMedID 32728249

  • Deep longitudinal multiomics profiling reveals two biological seasonal patterns in California. Nature communications Sailani, M. R., Metwally, A. A., Zhou, W. n., Rose, S. M., Ahadi, S. n., Contrepois, K. n., Mishra, T. n., Zhang, M. J., Kidziński, Ł. n., Chu, T. J., Snyder, M. P. 2020; 11 (1): 4933

    Abstract

    The influence of seasons on biological processes is poorly understood. In order to identify biological seasonal patterns based on diverse molecular data, rather than calendar dates, we performed a deep longitudinal multiomics profiling of 105 individuals over 4 years. Here, we report more than 1000 seasonal variations in omics analytes and clinical measures. The different molecules group into two major seasonal patterns which correlate with peaks in late spring and late fall/early winter in California. The two patterns are enriched for molecules involved in human biological processes such as inflammation, immunity, cardiovascular health, as well as neurological and psychiatric conditions. Lastly, we identify molecules and microbes that demonstrate different seasonal patterns in insulin sensitive and insulin resistant individuals. The results of our study have important implications in healthcare and highlight the value of considering seasonality when assessing population wide health risk and management.

    View details for DOI 10.1038/s41467-020-18758-1

    View details for PubMedID 33004787

  • The human body at cellular resolution: the NIH Human Biomolecular Atlas Program NATURE Snyder, M. P., Lin, S., Posgai, A., Atkinson, M., Regev, A., Rood, J., Rozenblatt-Rosen, O., Gaffney, L., Hupalowska, A., Satija, R., Gehlenborg, N., Shendure, J., Laskin, J., Harbury, P., Nystrom, N. A., Silverstein, J. C., Bar-Joseph, Z., Zhang, K., Borner, K., Lin, Y., Conroy, R., Procaccini, D., Roy, A. L., Pillai, A., Brown, M., Galis, Z. S., Cai, L., Shendure, J., Trapnell, C., Lin, S., Jackson, D., Snyder, M. P., Nolan, G., Greenleaf, W., Lin, Y., Plevritis, S., Ahadi, S., Nevins, S. A., Lee, H., Schuerch, C., Black, S., Venkataraaman, V., Esplin, E., Horning, A., Bahmani, A., Zhang, K., Sun, X., Jain, S., Hagood, J., Pryhuber, G., Kharchenko, P., Atkinson, M., Bodenmiller, B., Brusko, T., Clare-Salzler, M., Nick, H., Otto, K., Posgai, A., Wasserfall, C., Jorgensen, M., Brusko, M., Maffioletti, S., Caprioli, R. M., Spraggins, J. M., Gutierrez, D., Patterson, N., Neumann, E. K., Harris, R., deCaestecker, M., Fogo, A. B., van de Plas, R., Lau, K., Cai, L., Yuan, G., Zhu, Q., Dries, R., Yin, P., Saka, S. K., Kishi, J. Y., Wang, Y., Goldaracena, I., Laskin, J., Ye, D., Burnum-Johnson, K. E., Piehowski, P. D., Ansong, C., Zhu, Y., Harbury, P., Desai, T., Mulye, J., Chou, P., Nagendran, M., Bar-Joseph, Z., Teichmann, S. A., Paten, B., Murphy, R. F., Ma, J., Kiselev, V., Kingsford, C., Ricarte, A., Keays, M., Akoju, S. A., Ruffalo, M., Gehlenborg, N., Kharchenko, P., Vella, M., McCallum, C., Borner, K., Cross, L. E., Friedman, S. H., Heiland, R., Herr, B., Macklin, P., Quardokus, E. M., Record, L., Sluka, J. P., Weber, G. M., Nystrom, N. A., Silverstein, J. C., Blood, P. D., Ropelewski, A. J., Shirey, W. E., Scibek, R. M., Mabee, P., Lenhardt, W., Robasky, K., Michailidis, S., Satija, R., Marioni, J., Regev, A., Butler, A., Stuart, T., Fisher, E., Ghazanfar, S., Rood, J., Gaffney, L., Eraslan, G., Biancalani, T., Vaishnav, E. D., Conroy, R., Procaccini, D., Roy, A., Pillai, A., Brown, M., Galis, Z., Srinivas, P., Pawlyk, A., Sechi, S., Wilder, E., Anderson, J., HuBMAP Consortium 2019; 574 (7777): 187–92

    Abstract

    Transformative technologies are enabling the construction of three-dimensional maps of tissues with unprecedented spatial and molecular resolution. Over the next seven years, the NIH Common Fund Human Biomolecular Atlas Program (HuBMAP) intends to develop a widely accessible framework for comprehensively mapping the human body at single-cell resolution by supporting technology development, data acquisition, and detailed spatial mapping. HuBMAP will integrate its efforts with other funding agencies, programs, consortia, and the biomedical research community at large towards the shared vision of a comprehensive, accessible three-dimensional molecular and cellular atlas of the human body, in health and under various disease conditions.

    View details for DOI 10.1038/s41586-019-1629-x

    View details for Web of Science ID 000489784200035

    View details for PubMedID 31597973

    View details for PubMedCentralID PMC6800388

  • Big data and health. The Lancet. Digital health Snyder, M., Zhou, W. 2019; 1 (6): e252-e254

    View details for DOI 10.1016/S2589-7500(19)30109-8

    View details for PubMedID 33323249

  • HAT1 Coordinates Histone Production and Acetylation via H4 Promoter Binding. Molecular cell Gruber, J. J., Geller, B., Lipchik, A. M., Chen, J., Salahudeen, A. A., Ram, A. N., Ford, J. M., Kuo, C. J., Snyder, M. P. 2019

    Abstract

    The energetic costs of duplicating chromatin are large and therefore likely depend on nutrient sensing checkpoints and metabolic inputs. By studying chromatin modifiers regulated by epithelial growth factor, we identified histone acetyltransferase 1 (HAT1) as an induced gene that enhances proliferation through coordinating histone production, acetylation, and glucose metabolism. In addition to its canonical role as a cytoplasmic histone H4 acetyltransferase, we isolated a HAT1-containing complex bound specifically at promoters of H4 genes. HAT1-dependent transcription of H4 genes required an acetate-sensitive promoter element. HAT1 expression was critical for S-phase progression and maintenance of H3 lysine 9 acetylation at proliferation-associated genes, including histone genes. Therefore, these data describe a feedforward circuit whereby HAT1 captures acetyl groups on nascent histones and drives H4 production by chromatin binding to support chromatin replication and acetylation. These findings have important implications for human disease, since high HAT1 levels associate with poor outcomes across multiple cancer types.

    View details for DOI 10.1016/j.molcel.2019.05.034

    View details for PubMedID 31278053

  • The Integrative Human Microbiome Project NATURE Proctor, L. M., Creasy, H. H., Fettweis, J. M., Lloyd-Price, J., Mahurkar, A., Zhou, W., Buck, G. A., Snyder, M. P., Strauss, J. F., Weinstock, G. M., White, O., Huttenhower, C., Integrative HMP iHMP Res Network 2019; 569 (7758): 641–48

    Abstract

    The NIH Human Microbiome Project (HMP) has been carried out over ten years and two phases to provide resources, methods, and discoveries that link interactions between humans and their microbiomes to health-related outcomes. The recently completed second phase, the Integrative Human Microbiome Project, comprised studies of dynamic changes in the microbiome and host under three conditions: pregnancy and preterm birth; inflammatory bowel diseases; and stressors that affect individuals with prediabetes. The associated research begins to elucidate mechanisms of host-microbiome interactions under these conditions, provides unique data resources (at the HMP Data Coordination Center), and represents a paradigm for future multi-omic studies of the human microbiome.

    View details for DOI 10.1038/s41586-019-1238-8

    View details for Web of Science ID 000470144100031

    View details for PubMedID 31142853

  • A longitudinal big data approach for precision health NATURE MEDICINE Rose, S., Contrepois, K., Moneghetti, K. J., Zhou, W., Mishra, T., Mataraso, S., Dagan-Rosenfeld, O., Ganz, A. B., Dunn, J., Hornburg, D., Rego, S., Perelman, D., Ahadi, S., Sailani, M., Zhou, Y., Leopold, S. R., Chen, J., Ashland, M., Christle, J. W., Avina, M., Limcaoco, P., Ruiz, C., Tan, M., Butte, A. J., Weinstock, G. M., Slavich, G. M., Sodergren, E., McLaughlin, T. L., Haddad, F., Snyder, M. P. 2019; 25 (5): 792-+
  • Longitudinal multi-omics of host-microbe dynamics in prediabetes. Nature Zhou, W., Sailani, M. R., Contrepois, K., Zhou, Y., Ahadi, S., Leopold, S. R., Zhang, M. J., Rao, V., Avina, M., Mishra, T., Johnson, J., Lee-McMullen, B., Chen, S., Metwally, A. A., Tran, T. D., Nguyen, H., Zhou, X., Albright, B., Hong, B., Petersen, L., Bautista, E., Hanson, B., Chen, L., Spakowicz, D., Bahmani, A., Salins, D., Leopold, B., Ashland, M., Dagan-Rosenfeld, O., Rego, S., Limcaoco, P., Colbert, E., Allister, C., Perelman, D., Craig, C., Wei, E., Chaib, H., Hornburg, D., Dunn, J., Liang, L., Rose, S. M., Kukurba, K., Piening, B., Rost, H., Tse, D., McLaughlin, T., Sodergren, E., Weinstock, G. M., Snyder, M. 2019; 569 (7758): 663–71

    Abstract

    Type 2 diabetes mellitus (T2D) is a growing health problem, but little is known about its early disease stages, its effects on biological processes or the transition to clinical T2D. To understand the earliest stages of T2Dbetter, we obtained samples from 106 healthy individuals and individuals with prediabetes over approximately four years and performed deep profiling of transcriptomes, metabolomes, cytokines, and proteomes, as well as changes in the microbiome. This rich longitudinal data set revealed many insights: first, healthy profiles are distinct among individuals while displaying diverse patterns of intra- and/or inter-personal variability. Second, extensive host and microbial changes occur during respiratory viral infections and immunization, and immunization triggers potentially protective responses that are distinct from responses to respiratory viral infections. Moreover, during respiratory viral infections, insulin-resistant participants respond differently than insulin-sensitive participants. Third, global co-association analyses among the thousands of profiled molecules reveal specific host-microbe interactions that differ between insulin-resistant and insulin-sensitive individuals. Last, we identified early personal molecular signatures in one individual that preceded the onset of T2D, including the inflammation markers interleukin-1 receptor agonist (IL-1RA) and high-sensitivity C-reactive protein (CRP) paired with xenobiotic-induced immune signalling. Our study reveals insights into pathways and responses that differ between glucose-dysregulated and healthy individuals during health and disease and provides an open-access data resource to enable further research into healthy, prediabetic and T2D states.

    View details for DOI 10.1038/s41586-019-1236-x

    View details for PubMedID 31142858

  • The NASA Twins Study: A multidimensional analysis of a year-long human spaceflight SCIENCE Garrett-Bakelman, F. E., Darshi, M., Green, S. J., Gur, R. C., Lin, L., Macias, B. R., McKenna, M. J., Meydan, C., Mishra, T., Nasrini, J., Piening, B. D., Rizzardi, L. F., Sharma, K., Siamwala, J. H., Taylor, L., Vitaterna, M., Afkarian, M., Afshinnekoo, E., Ahadi, S., Ambati, A., Arya, M., Bezdan, D., Callahan, C. M., Chen, S., Choi, A. K., Chlipala, G. E., Contrepois, K., Covington, M., Crucian, B. E., De Vivo, I., Dinges, D. F., Ebert, D. J., Feinberg, J. I., Gandara, J. A., George, K. A., Goutsias, J., Grills, G. S., Hargens, A. R., Heer, M., Hillary, R. P., Hoofnagle, A. N., Hook, V. H., Jenkinson, G., Jiang, P., Keshavarzian, A., Laurie, S. S., Lee-McMullen, B., Lumpkins, S. B., MacKay, M., Maienschein-Cline, M. G., Melnick, A. M., Moore, T. M., Nakahira, K., Patel, H. H., Pietrzyk, R., Rao, V., Saito, R., Salins, D. N., Schilling, J. M., Sears, D. D., Sheridan, C. K., Stenger, M. B., Tryggvadottir, R., Urban, A. E., Vaisar, T., Van Espen, B., Zhang, J., Ziegler, M. G., Zwart, S. R., Charles, J. B., Kundrot, C. E., Scott, G. I., Bailey, S. M., Basner, M., Feinberg, A. P., Lee, S. C., Mason, C. E., Mignot, E., Rana, B. K., Smith, S. M., Snyder, M. P., Turek, F. W. 2019; 364 (6436): 144-+
  • Gene-Environment Interaction in the Era of Precision Medicine CELL Li, J., Li, X., Zhang, S., Snyder, M. 2019; 177 (1): 38–44
  • A longitudinal big data approach for precision health. Nature medicine Schüssler-Fiorenza Rose, S. M., Contrepois, K. n., Moneghetti, K. J., Zhou, W. n., Mishra, T. n., Mataraso, S. n., Dagan-Rosenfeld, O. n., Ganz, A. B., Dunn, J. n., Hornburg, D. n., Rego, S. n., Perelman, D. n., Ahadi, S. n., Sailani, M. R., Zhou, Y. n., Leopold, S. R., Chen, J. n., Ashland, M. n., Christle, J. W., Avina, M. n., Limcaoco, P. n., Ruiz, C. n., Tan, M. n., Butte, A. J., Weinstock, G. M., Slavich, G. M., Sodergren, E. n., McLaughlin, T. L., Haddad, F. n., Snyder, M. P. 2019; 25 (5): 792–804

    Abstract

    Precision health relies on the ability to assess disease risk at an individual level, detect early preclinical conditions and initiate preventive strategies. Recent technological advances in omics and wearable monitoring enable deep molecular and physiological profiling and may provide important tools for precision health. We explored the ability of deep longitudinal profiling to make health-related discoveries, identify clinically relevant molecular pathways and affect behavior in a prospective longitudinal cohort (n = 109) enriched for risk of type 2 diabetes mellitus. The cohort underwent integrative personalized omics profiling from samples collected quarterly for up to 8 years (median, 2.8 years) using clinical measures and emerging technologies including genome, immunome, transcriptome, proteome, metabolome, microbiome and wearable monitoring. We discovered more than 67 clinically actionable health discoveries and identified multiple molecular pathways associated with metabolic, cardiovascular and oncologic pathophysiology. We developed prediction models for insulin resistance by using omics measurements, illustrating their potential to replace burdensome tests. Finally, study participation led the majority of participants to implement diet and exercise changes. Altogether, we conclude that deep longitudinal profiling can lead to actionable health discoveries and provide relevant information for precision health.

    View details for PubMedID 31068711

  • Chromatin Remodeling in Response to BRCA2-Crisis. Cell reports Gruber, J. J., Chen, J. n., Geller, B. n., Jäger, N. n., Lipchik, A. M., Wang, G. n., Kurian, A. W., Ford, J. M., Snyder, M. P. 2019; 28 (8): 2182–93.e6

    Abstract

    Individuals with a single functional copy of the BRCA2 tumor suppressor have elevated risks for breast, ovarian, and other solid tumor malignancies. The exact mechanisms of carcinogenesis due to BRCA2 haploinsufficiency remain unclear, but one possibility is that at-risk cells are subject to acute periods of decreased BRCA2 availability and function ("BRCA2-crisis"), which may contribute to disease. Here, we establish an in vitro model for BRCA2-crisis that demonstrates chromatin remodeling and activation of an NF-κB survival pathway in response to transient BRCA2 depletion. Mechanistically, we identify BRCA2 chromatin binding, histone acetylation, and associated transcriptional activity as critical determinants of the epigenetic response to BRCA2-crisis. These chromatin alterations are reflected in transcriptional profiles of pre-malignant tissues from BRCA2 carriers and, therefore, may reflect natural steps in human disease. By modeling BRCA2-crisis in vitro, we have derived insights into pre-neoplastic molecular alterations that may enhance the development of preventative therapies.

    View details for DOI 10.1016/j.celrep.2019.07.057

    View details for PubMedID 31433991

  • The NASA Twins Study: A multidimensional analysis of a year-long human spaceflight. Science (New York, N.Y.) Garrett-Bakelman, F. E., Darshi, M. n., Green, S. J., Gur, R. C., Lin, L. n., Macias, B. R., McKenna, M. J., Meydan, C. n., Mishra, T. n., Nasrini, J. n., Piening, B. D., Rizzardi, L. F., Sharma, K. n., Siamwala, J. H., Taylor, L. n., Vitaterna, M. H., Afkarian, M. n., Afshinnekoo, E. n., Ahadi, S. n., Ambati, A. n., Arya, M. n., Bezdan, D. n., Callahan, C. M., Chen, S. n., Choi, A. M., Chlipala, G. E., Contrepois, K. n., Covington, M. n., Crucian, B. E., De Vivo, I. n., Dinges, D. F., Ebert, D. J., Feinberg, J. I., Gandara, J. A., George, K. A., Goutsias, J. n., Grills, G. S., Hargens, A. R., Heer, M. n., Hillary, R. P., Hoofnagle, A. N., Hook, V. Y., Jenkinson, G. n., Jiang, P. n., Keshavarzian, A. n., Laurie, S. S., Lee-McMullen, B. n., Lumpkins, S. B., MacKay, M. n., Maienschein-Cline, M. G., Melnick, A. M., Moore, T. M., Nakahira, K. n., Patel, H. H., Pietrzyk, R. n., Rao, V. n., Saito, R. n., Salins, D. N., Schilling, J. M., Sears, D. D., Sheridan, C. K., Stenger, M. B., Tryggvadottir, R. n., Urban, A. E., Vaisar, T. n., Van Espen, B. n., Zhang, J. n., Ziegler, M. G., Zwart, S. R., Charles, J. B., Kundrot, C. E., Scott, G. B., Bailey, S. M., Basner, M. n., Feinberg, A. P., Lee, S. M., Mason, C. E., Mignot, E. n., Rana, B. K., Smith, S. M., Snyder, M. P., Turek, F. W. 2019; 364 (6436)

    Abstract

    To understand the health impact of long-duration spaceflight, one identical twin astronaut was monitored before, during, and after a 1-year mission onboard the International Space Station; his twin served as a genetically matched ground control. Longitudinal assessments identified spaceflight-specific changes, including decreased body mass, telomere elongation, genome instability, carotid artery distension and increased intima-media thickness, altered ocular structure, transcriptional and metabolic changes, DNA methylation changes in immune and oxidative stress-related pathways, gastrointestinal microbiota alterations, and some cognitive decline postflight. Although average telomere length, global gene expression, and microbiome changes returned to near preflight levels within 6 months after return to Earth, increased numbers of short telomeres were observed and expression of some genes was still disrupted. These multiomic, molecular, physiological, and behavioral datasets provide a valuable roadmap of the putative health risks for future human spaceflight.

    View details for PubMedID 30975860

  • High-Resolution Bisulfite-Sequencing of Peripheral Blood DNA Methylation in Early-Onset and Familial Risk Breast Cancer Patients. Clinical cancer research : an official journal of the American Association for Cancer Research Chen, J. n., Haanpää, M. K., Gruber, J. J., Jäger, N. n., Ford, J. M., Snyder, M. P. 2019

    Abstract

    Understanding and explaining hereditary predisposition to cancer has focused on the genetic etiology of the disease. However, mutations in known genes associated with breast cancer, such as BRCA1 and BRCA2, account for less than 25% of familial cases of breast cancer. Recently, specific epigenetic modifications at BRCA1 have been shown to promote hereditary breast cancer, but the broader potential for epigenetic contribution to hereditary breast cancer is not yet well understood.We examined DNA methylation through deep bisulfite sequencing of CpG islands and known promoter or regulatory regions in peripheral blood DNA from 99 familial or early-onset breast or ovarian cancer patients, 6 unaffected BRCA-mutation carriers, and 49 unaffected controls.In 9% of patients, we observed altered methylation in the promoter regions of genes known to be involved in cancer including hypermethylation at the tumor suppressor PTEN and hypomethylation at the proto-oncogene TEX14 These alterations occur in the form of allelic methylation that span up to hundreds of base-pairs in length.Our observations suggest a broader role for DNA methylation in early-onset, familial risk breast cancer. Further studies are warranted to clarify these mechanisms and the benefits of DNA methylation screening for early risk prediction of familial cancers.

    View details for DOI 10.1158/1078-0432.CCR-18-2423

    View details for PubMedID 31175093

  • Metformin Affects Heme Function as a Possible Mechanism of Action. G3 (Bethesda, Md.) Li, X., Wang, X., Snyder, M. P. 2018

    Abstract

    Metformin elicits pleiotropic effects that are beneficial for treating diabetes, and as well as particular cancers and aging. In spite of its importance, a convincing and unifying mechanism to explain how metformin operates is lacking. Here we describe investigations into the mechanism of metformin action through heme and hemoprotein(s). Metformin suppresses heme production by 50% in yeast, and this suppression requires mitochondria function, which is necessary for heme synthesis. At high concentrations comparable to those in the clinic, metformin also suppresses heme production in human erythrocytes, erythropoietic cells and hepatocytes by 30-50%; the heme-targeting drug artemisinin operates at a greater potency. Significantly, metformin prevents oxidation of heme in three protein scaffolds, cytochrome c, myoglobin and hemoglobin, with Kd values < 3 mM suggesting a dual oxidation and reduction role in the regulation of heme redox transition. Since heme- and porphyrin-like groups operate in diverse enzymes that control important metabolic processes, we suggest that metformin acts, at least in part, through stabilizing appropriate redox states in heme and other porphyrin-containing groups to control cellular metabolism.

    View details for PubMedID 30554148

  • High Frequency Actionable Pathogenic Exome Variants in an Average-Risk Cohort. Cold Spring Harbor molecular case studies Rego, S., Dagan-Rosenfeld, O., Zhou, W., Sailani, M. R., Limcaoco, P., Colbert, E., Avina, M., Wheeler, J., Craig, C., Salins, D., Rost, H. L., Dunn, J., McLaughlin, T., Steinmetz, L. M., Bernstein, J. A., Snyder, M. P. 2018

    Abstract

    Exome sequencing is increasingly utilized in both clinical and non-clinical settings, but little is known about its utility in healthy individuals. Most previous studies on this topic have examined a small subset of genes known to be implicated in human disease and/or have used automated pipelines to assess pathogenicity of known variants. In order to determine the frequency of both medically actionable and non-actionable but medically relevant exome findings in the general population we assessed the exomes of 70 participants who have been extensively characterized over the past several years as part of a longitudinal integrated multi-omics profiling study. We analyzed exomes by identifying rare likely pathogenic and pathogenic variants in genes associated with Mendelian disease in the Online Mendelian Inheritance in Man (OMIM) database. We then used American College of Medical Genetics (ACMG) guidelines for the classification of rare sequence variants. Additionally, we assessed pharmacogenetic variants. Twelve out of 70 (17%) participants had medically actionable findings in Mendelian disease genes. Five had phenotypes or family histories associated with their genetic variants. The frequency of actionable variants is higher than that reported in most previous studies and suggests added benefit from utilizing expanded gene lists and manual curation to assess actionable findings. A total of 63 participants (90%) had additional non-actionable findings, including 60 who were found to be carriers for recessive diseases and 21 who have increased Alzheimer's disease risk due to heterozygous or homozygous APOE e4 alleles (18 participants had both). Our results suggest that exome sequencing may have considerable more utility for health management in the general population than previously thought.

    View details for PubMedID 30487145

  • Longitudinal personal DNA methylome dynamics in a human with a chronic condition. Nature medicine Chen, R., Xia, L., Tu, K., Duan, M., Kukurba, K., Li-Pook-Than, J., Xie, D., Snyder, M. 2018

    Abstract

    Epigenomics regulates gene expression and is as important as genomics in precision personal health, as it is heavily influenced by environment and lifestyle. We profiled whole-genome DNA methylation and the corresponding transcriptome of peripheral blood mononuclear cells collected from a human volunteer over a period of 36 months, generating 28 methylome and 57 transcriptome datasets. We found that DNA methylomic changes are associated with infrequent glucose level alteration, whereas the transcriptome underwent dynamic changes during events such as viral infections. Most DNA meta-methylome changes occurred 80-90days before clinically detectable glucose elevation. Analysis of the deep personal methylome dataset revealed an unprecedented number of allelic differentially methylated regions that remain stable longitudinally and are preferentially associated with allele-specific gene regulation. Our results revealed that changes in different types of 'omics' data associate with different physiological aspects of this individual: DNA methylation with chronic conditions and transcriptome with acute events.

    View details for PubMedID 30397358

  • Dynamic Human Environmental Exposome Revealed by Longitudinal Personal Monitoring. Cell Jiang, C., Wang, X., Li, X., Inlora, J., Wang, T., Liu, Q., Snyder, M. 2018; 175 (1): 277

    Abstract

    Human health is dependent upon environmental exposures, yet the diversity and variation in exposures are poorly understood. We developed a sensitive method to monitor personal airborne biological and chemical exposures and followed the personal exposomes of 15 individuals for up to 890days and over 66 distinct geographical locations. We found that individuals are potentially exposed to thousands of pan-domain species and chemical compounds, including insecticides and carcinogens. Personal biological and chemical exposomes are highly dynamic and vary spatiotemporally, even for individuals located in the same general geographical region.Integrated analysis of biological and chemical exposomes revealed strong location-dependent relationships. Finally, construction of an exposome interaction network demonstrated the presence of distinct yet interconnected human- and environment-centric clouds, comprised of interacting ecosystems such as human, flora, pets, and arthropods. Overall, we demonstrate that human exposomes are diverse, dynamic, spatiotemporally-driven interaction networks with the potential to impact human health.

    View details for PubMedID 30241608

  • Decoding the Genomics of Abdominal Aortic Aneurysm. Cell Li, J., Pan, C., Zhang, S., Spin, J. M., Deng, A., Leung, L. L., Dalman, R. L., Tsao, P. S., Snyder, M. 2018; 174 (6): 1361

    Abstract

    A key aspect of genomic medicine is to make individualized clinical decisions from personal genomes. We developed a machine-learning framework to integrate personal genomes and electronic health record (EHR) data and used this framework to study abdominal aortic aneurysm (AAA), a prevalent irreversible cardiovascular disease with unclear etiology. Performing whole-genome sequencing on AAA patients and controls, we demonstrated its predictive precision solely from personal genomes. By modeling personal genomes with EHRs, this framework quantitatively assessed the effectiveness of adjusting personal lifestyles given personal genome baselines, demonstrating its utility as a personal health management tool. We showed that this new framework agnostically identified genetic components involved in AAA, which were subsequently validated in human aortic tissues and in murine models. Our study presents a new framework for disease genome analysis, which can be used for both health management and understanding the biological architecture of complex diseases. VIDEO ABSTRACT.

    View details for PubMedID 30193110

  • Glucotypes reveal new patterns of glucose dysregulation. PLoS biology Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., McLaughlin, T., Snyder, M. 2018; 16 (7): e2005143

    Abstract

    Diabetes is an increasing problem worldwide; almost 30 million people, nearly 10% of the population, in the United States are diagnosed with diabetes. Another 84 million are prediabetic, and without intervention, up to 70% of these individuals may progress to type 2 diabetes. Current methods for quantifying blood glucose dysregulation in diabetes and prediabetes are limited by reliance on single-time-point measurements or on average measures of overall glycemia and neglect glucose dynamics. We have used continuous glucose monitoring (CGM) to evaluate the frequency with which individuals demonstrate elevations in postprandial glucose, the types of patterns, and how patterns vary between individuals given an identical nutrient challenge. Measurement of insulin resistance and secretion highlights the fact that the physiology underlying dysglycemia is highly variable between individuals. We developed an analytical framework that can group individuals according to specific patterns of glycemic responses called "glucotypes" that reveal heterogeneity, or subphenotypes, within traditional diagnostic categories of glucose regulation. Importantly, we found that even individuals considered normoglycemic by standard measures exhibit high glucose variability using CGM, with glucose levels reaching prediabetic and diabetic ranges 15% and 2% of the time, respectively. We thus show that glucose dysregulation, as characterized by CGM, is more prevalent and heterogeneous than previously thought and can affect individuals considered normoglycemic by standard measures, and specific patterns of glycemic responses reflect variable underlying physiology. The interindividual variability in glycemic responses to standardized meals also highlights the personal nature of glucose regulation. Through extensive phenotyping, we developed a model for identifying potential mechanisms of personal glucose dysregulation and built a webtool for visualizing a user-uploaded CGM profile and classifying individualized glucose patterns into glucotypes.

    View details for PubMedID 30040822

  • Natural Selection Has Differentiated the Progesterone Receptor among Human Populations. American journal of human genetics Li, J., Hong, X., Mesiano, S., Muglia, L. J., Wang, X., Snyder, M., Stevenson, D. K., Shaw, G. M. 2018

    Abstract

    The progesterone receptor (PGR) plays a central role in maintaining pregnancy and is significantly associated with medical conditions such as preterm birth that affects 12.6% of all the births in U.S. PGR has been evolving rapidly since the common ancestor of human and chimpanzee, and we herein investigated evolutionary dynamics of PGR during recent human migration and population differentiation. Our study revealed substantial population differentiation at the PGR locus driven by natural selection, where very recent positive selection in East Asians has substantially decreased its genetic diversity by nearly fixing evolutionarily novel alleles. On the contrary, in European populations, the PGR locus has been promoted to a highly polymorphic state likely due to balancing selection. Integrating transcriptome data across multiple tissue types together with large-scale genome-wide association data for preterm birth, our study demonstrated the consequence of the selection event in East Asians on remodeling PGR expression specifically in the ovary and determined a significant association of early spontaneous preterm birth with the evolutionarily selected variants. To reconstruct its evolutionary trajectory on the human lineage, we observed substantial differentiation between modern and archaic humans at the PGR locus, including fixation of a deleterious missense allele in the Neanderthal genome that was later introgressed in modern human populations. Taken together, our study revealed substantial evolutionary innovation in PGR even during very recent human evolution, and its different forms among human populations likely result in differential susceptibility to progesterone-associated disease conditions including preterm birth.

    View details for PubMedID 29937092

  • Systematic Protein Prioritization for Targeted Proteomics Studies through Literature Mining JOURNAL OF PROTEOME RESEARCH Yu, K., Lee, T., Wan, C., Chen, Y., Re, C., Kou, S. C., Chiang, J., Kohane, I. S., Snyder, M. 2018; 17 (4): 1383–96

    Abstract

    There are more than 3.7 million published articles on the biological functions or disease implications of proteins, constituting an important resource of proteomics knowledge. However, it is difficult to summarize the millions of proteomics findings in the literature manually and quantify their relevance to the biology and diseases of interest. We developed a fully automated bioinformatics framework to identify and prioritize proteins associated with any biological entity. We used the 22 targeted areas of the Biology/Disease-driven (B/D)-Human Proteome Project (HPP) as examples, prioritized the relevant proteins through their Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores, validated the relevance of the score by comparing the protein prioritization results with a curated database, computed the scores of proteins across the topics of B/D-HPP, and characterized the top proteins in the common model organisms. We further extended the bioinformatics workflow to identify the relevant proteins in all organ systems and human diseases and deployed a cloud-based tool to prioritize proteins related to any custom search terms in real time. Our tool can facilitate the prioritization of proteins for any organ system or disease of interest and can contribute to the development of targeted proteomic studies for precision medicine.

    View details for PubMedID 29505266

  • Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome GENOME RESEARCH Tilgner, H., Jahanbani, F., Gupta, I., Collier, P., Wei, E., Rasmussen, M., Snyder, M. 2018; 28 (2): 231–42

    Abstract

    Understanding transcriptome complexity is crucial for understanding human biology and disease. Technologies such as Synthetic long-read RNA sequencing (SLR-RNA-seq) delivered 5 million isoforms and allowed assessing splicing coordination. Pacific Biosciences and Oxford Nanopore increase throughput also but require high input amounts or amplification. Our new droplet-based method, sparse isoform sequencing (spISO-seq), sequences 100k-200k partitions of 10-200 molecules at a time, enabling analysis of 10-100 million RNA molecules. SpISO-seq requires less than 1 ng of input cDNA, limiting or removing the need for prior amplification with its associated biases. Adjusting the number of reads devoted to each molecule reduces sequencing lanes and cost, with little loss in detection power. The increased number of molecules expands our understanding of isoform complexity. In addition to confirming our previously published cases of splicing coordination (e.g., BIN1), the greater depth reveals many new cases, such as MAPT Coordination of internal exons is found to be extensive among protein coding genes: 23.5%-59.3% (95% confidence interval) of highly expressed genes with distant alternative exons exhibit coordination, showcasing the need for long-read transcriptomics. However, coordination is less frequent for noncoding sequences, suggesting a larger role of splicing coordination in shaping proteins. Groups of genes with coordination are involved in protein-protein interactions with each other, raising the possibility that coordination facilitates complex formation and/or function. We also find new splicing coordination types, involving initial and terminal exons. Our results provide a more comprehensive understanding of the human transcriptome and a general, cost-effective method to analyze it.

    View details for PubMedID 29196558

    View details for PubMedCentralID PMC5793787

  • A genome-wide association study identifies only two ancestry specific variants associated with spontaneous preterm birth SCIENTIFIC REPORTS Rappoport, N., Toung, J., Hadley, D., Wong, R. J., Fujioka, K., Reuter, J., Abbott, C. W., Oh, S., Hu, D., Eng, C., Huntsman, S., Bodian, D. L., Niederhuber, J. E., Hong, X., Zhang, G., Sikora-Wohfeld, W., Gignoux, C. R., Wang, H., Oehlert, J., Jelliffe-Pawlowski, L. L., Gould, J. B., Darmstadt, G. L., Wang, X., Bustamante, C. D., Snyder, M. P., Ziv, E., Patsopoulos, N. A., Muglia, L. J., Burchard, E., Shaw, G. M., O'Brodovich, H. M., Stevenson, D. K., Butte, A. J., Sirota, M. 2018; 8: 226

    Abstract

    Preterm birth (PTB), or the delivery prior to 37 weeks of gestation, is a significant cause of infant morbidity and mortality. Although twin studies estimate that maternal genetic contributions account for approximately 30% of the incidence of PTB, and other studies reported fetal gene polymorphism association, to date no consistent associations have been identified. In this study, we performed the largest reported genome-wide association study analysis on 1,349 cases of PTB and 12,595 ancestry-matched controls from the focusing on genomic fetal signals. We tested over 2 million single nucleotide polymorphisms (SNPs) for associations with PTB across five subpopulations: African (AFR), the Americas (AMR), European, South Asian, and East Asian. We identified only two intergenic loci associated with PTB at a genome-wide level of significance: rs17591250 (P = 4.55E-09) on chromosome 1 in the AFR population and rs1979081 (P = 3.72E-08) on chromosome 8 in the AMR group. We have queried several existing replication cohorts and found no support of these associations. We conclude that the fetal genetic contribution to PTB is unlikely due to single common genetic variant, but could be explained by interactions of multiple common variants, or of rare variants affected by environmental influences, all not detectable using a GWAS alone.

    View details for PubMedID 29317701

  • Integrative Personal Omics Profiles during Periods of Weight Gain and Loss. Cell systems Piening, B. D., Zhou, W. n., Contrepois, K. n., Röst, H. n., Gu Urban, G. J., Mishra, T. n., Hanson, B. M., Bautista, E. J., Leopold, S. n., Yeh, C. Y., Spakowicz, D. n., Banerjee, I. n., Chen, C. n., Kukurba, K. n., Perelman, D. n., Craig, C. n., Colbert, E. n., Salins, D. n., Rego, S. n., Lee, S. n., Zhang, C. n., Wheeler, J. n., Sailani, M. R., Liang, L. n., Abbott, C. n., Gerstein, M. n., Mardinoglu, A. n., Smith, U. n., Rubin, D. L., Pitteri, S. n., Sodergren, E. n., McLaughlin, T. L., Weinstock, G. M., Snyder, M. P. 2018

    Abstract

    Advances in omics technologies now allow an unprecedented level of phenotyping for human diseases, including obesity, in which individual responses to excess weight are heterogeneous and unpredictable. To aid the development of better understanding of these phenotypes, we performed a controlled longitudinal weight perturbation study combining multiple omics strategies (genomics, transcriptomics, multiple proteomics assays, metabolomics, and microbiomics) during periods of weight gain and loss in humans. Results demonstrated that: (1) weight gain is associated with the activation of strong inflammatory and hypertrophic cardiomyopathy signatures in blood; (2) although weight loss reverses some changes, a number of signatures persist, indicative of long-term physiologic changes; (3) we observed omics signatures associated with insulin resistance that may serve as novel diagnostics; (4) specific biomolecules were highly individualized and stable in response to perturbations, potentially representing stable personalized markers. Most data are available open access and serve as a valuable resource for the community.

    View details for PubMedID 29361466

  • Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma CELL SYSTEMS Yu, K., Berry, G. J., Rubin, D. L., Re, C., Altman, R. B., Snyder, M. 2017; 5 (6): 620-+

    Abstract

    Adenocarcinoma accounts for more than 40% of lung malignancy, and microscopic pathology evaluation is indispensable for its diagnosis. However, how histopathology findings relate to molecular abnormalities remains largely unknown. Here, we obtained H&E-stained whole-slide histopathology images, pathology reports, RNA sequencing, and proteomics data of 538 lung adenocarcinoma patients from The Cancer Genome Atlas and used these to identify molecular pathways associated with histopathology patterns. We report cell-cycle regulation and nucleotide binding pathways underpinning tumor cell dedifferentiation, and we predicted histology grade using transcriptomics and proteomics signatures (area under curve >0.80). We built an integrative histopathology-transcriptomics model to generate better prognostic predictions for stage I patients (p = 0.0182 ± 0.0021) compared with gene expression or histopathology studies alone, and the results were replicated in an independent cohort (p = 0.0220 ± 0.0070). These results motivate the integration of histopathology and omics data to investigate molecular mechanisms of pathology findings and enhance clinical prognostic prediction.

    View details for PubMedID 29153840

    View details for PubMedCentralID PMC5746468

  • Plasma sterols and depressive symptom severity in a population-based cohort PLOS ONE Cenik, B., Cenik, C., Snyder, M. P., Brown, E. 2017; 12 (9): e0184382

    Abstract

    Convergent evidence strongly suggests major depressive disorder is heterogeneous in its etiology and clinical characteristics. Depression biomarkers hold potential for identifying etiological subtypes, improving diagnostic accuracy, predicting treatment response, and personalization of treatment. Human plasma contains numerous sterols that have not been systematically studied. Changes in cholesterol concentrations have been implicated in suicide and depression, suggesting plasma sterols may be depression biomarkers. Here, we investigated associations between plasma levels of 34 sterols (measured by mass spectrometry) and scores on the Quick Inventory of Depressive Symptomatology-Self Report (QIDS-SR16) scale in 3117 adult participants in the Dallas Heart Study, an ethnically diverse, population-based cohort. We built a random forest model using feature selection from a pool of 43 variables including demographics, general health indicators, and sterol concentrations. This model comprised 19 variables, 13 of which were sterol concentrations, and explained 15.5% of the variation in depressive symptoms. Desmosterol concentrations below the fifth percentile (1.9 ng/mL, OR 1.9, 95% CI 1.2-2.9) were significantly associated with depressive symptoms of at least moderate severity (QIDS-SR16 score ≥10.5). This is the first study reporting a novel association between plasma concentrations cholesterol precursors and depressive symptom severity.

    View details for PubMedID 28886149

  • Fetal de novo mutations and preterm birth. PLoS genetics Li, J., Oehlert, J., Snyder, M., Stevenson, D. K., Shaw, G. M. 2017; 13 (4)

    Abstract

    Preterm birth (PTB) affects ~12% of pregnancies in the US. Despite its high mortality and morbidity, the molecular etiology underlying PTB has been unclear. Numerous studies have been devoted to identifying genetic factors in maternal and fetal genomes, but so far few genomic loci have been associated with PTB. By analyzing whole-genome sequencing data from 816 trio families, for the first time, we observed the role of fetal de novo mutations in PTB. We observed a significant increase in de novo mutation burden in PTB fetal genomes. Our genomic analyses further revealed that affected genes by PTB de novo mutations were dosage sensitive, intolerant to genomic deletions, and their mouse orthologs were likely developmentally essential. These genes were significantly involved in early fetal brain development, which was further supported by our analysis of copy number variants identified from an independent PTB cohort. Our study indicates a new mechanism in PTB occurrence independently contributed from fetal genomes, and thus opens a new avenue for future PTB research.

    View details for DOI 10.1371/journal.pgen.1006689

    View details for PubMedID 28388617

  • De novo and rare mutations in the HSPA1L heat shock gene associated with inflammatory bowel disease GENOME MEDICINE Takahashi, S., Andreoletti, G., Chen, R., Munehira, Y., Batra, A., Afzal, N. A., Beattie, R. M., Bernstein, J. A., Ennis, S., Snyder, M. 2017; 9

    Abstract

    Inflammatory bowel disease (IBD) is a chronic, relapsing inflammatory disease of the gastrointestinal tract which includes ulcerative colitis and Crohn's disease. Genetic risk factors for IBD are not well understood.We performed a family-based whole exome sequencing (WES) analysis on a core family (Family A) to identify potential causal mutations and then analyzed exome data from a Caucasian pediatric cohort (136 patients and 106 controls) to validate the presence of mutations in the candidate gene, heat shock 70 kDa protein 1-like (HSPA1L). Biochemical assays of the de novo and rare (minor allele frequency, MAF < 0.01) mutation variant proteins further validated the predicted deleterious effects of the identified alleles.In the proband of Family A, we found a heterozygous de novo mutation (c.830C > T; p.Ser277Leu) in HSPA1L. Through analysis of WES data of 136 patients, we identified five additional rare HSPA1L mutations (p.Gly77Ser, p.Leu172del, p.Thr267Ile, p.Ala268Thr, p.Glu558Asp) in six patients. In contrast, rare HSPA1L mutations were not observed in controls, and were significantly enriched in patients (P = 0.02). Interestingly, we did not find non-synonymous rare mutations in the HSP70 isoforms HSPA1A and HSPA1B. Biochemical assays revealed that all six rare HSPA1L variant proteins showed decreased chaperone activity in vitro. Moreover, three variants demonstrated dominant negative effects on HSPA1L and HSPA1A protein activity.Our results indicate that de novo and rare mutations in HSPA1L are associated with IBD and provide insights into the pathogenesis of IBD, and also expand our understanding of the roles of HSP70s in human disease.

    View details for DOI 10.1186/s13073-016-0394-9

    View details for PubMedID 28126021

  • Digital Health: Tracking Physiomes and Activity Using Wearable Biosensors Reveals Useful Health-Related Information. PLoS biology Li, X., Dunn, J., Salins, D., Zhou, G., Zhou, W., Schüssler-Fiorenza Rose, S. M., Perelman, D., Colbert, E., Runge, R., Rego, S., Sonecha, R., Datta, S., McLaughlin, T., Snyder, M. P. 2017; 15 (1)

    Abstract

    A new wave of portable biosensors allows frequent measurement of health-related physiology. We investigated the use of these devices to monitor human physiological changes during various activities and their role in managing health and diagnosing and analyzing disease. By recording over 250,000 daily measurements for up to 43 individuals, we found personalized circadian differences in physiological parameters, replicating previous physiological findings. Interestingly, we found striking changes in particular environments, such as airline flights (decreased peripheral capillary oxygen saturation [SpO2] and increased radiation exposure). These events are associated with physiological macro-phenotypes such as fatigue, providing a strong association between reduced pressure/oxygen and fatigue on high-altitude flights. Importantly, we combined biosensor information with frequent medical measurements and made two important observations: First, wearable devices were useful in identification of early signs of Lyme disease and inflammatory responses; we used this information to develop a personalized, activity-based normalization framework to identify abnormal physiological signals from longitudinal data for facile disease detection. Second, wearables distinguish physiological differences between insulin-sensitive and -resistant individuals. Overall, these results indicate that portable biosensors provide useful information for monitoring personal activities and physiology and are likely to play an important role in managing health and enabling affordable health care access to groups traditionally limited by socioeconomic class or remote geography.

    View details for DOI 10.1371/journal.pbio.2001402

    View details for PubMedID 28081144

  • Static and Dynamic DNA Loops form AP-1-Bound Activation Hubs during Macrophage Development. Molecular cell Phanstiel, D. H., Van Bortle, K. n., Spacek, D. n., Hess, G. T., Shamim, M. S., Machol, I. n., Love, M. I., Aiden, E. L., Bassik, M. C., Snyder, M. P. 2017; 67 (6): 1037–48.e6

    Abstract

    The three-dimensional arrangement of the human genome comprises a complex network of structural and regulatory chromatin loops important for coordinating changes in transcription during human development. To better understand the mechanisms underlying context-specific 3D chromatin structure and transcription during cellular differentiation, we generated comprehensive in situ Hi-C maps of DNA loops in human monocytes and differentiated macrophages. We demonstrate that dynamic looping events are regulatory rather than structural in nature and uncover widespread coordination of dynamic enhancer activity at preformed and acquired DNA loops. Enhancer-bound loop formation and enhancer activation of preformed loops together form multi-loop activation hubs at key macrophage genes. Activation hubs connect 3.4 enhancers per promoter and exhibit a strong enrichment for activator protein 1 (AP-1)-binding events, suggesting that multi-loop activation hubs involving cell-type-specific transcription factors represent an important class of regulatory chromatin structures for the spatiotemporal control of transcription.

    View details for PubMedID 28890333

  • Patient-Specific iPSC-Derived Endothelial Cells Uncover Pathways that Protect against Pulmonary Hypertension in BMPR2 Mutation Carriers. Cell stem cell Gu, M., Shao, N., Sa, S., Li, D., Termglinchan, V., Ameen, M., Karakikes, I., Sosa, G., Grubert, F., Lee, J., Cao, A., Taylor, S., Ma, Y., Zhao, Z., Chappell, J., Hamid, R., Austin, E. D., Gold, J. D., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2016

    Abstract

    In familial pulmonary arterial hypertension (FPAH), the autosomal dominant disease-causing BMPR2 mutation is only 20% penetrant, suggesting that genetic variation provides modifiers that alleviate the disease. Here, we used comparison of induced pluripotent stem cell-derived endothelial cells (iPSC-ECs) from three families with unaffected mutation carriers (UMCs), FPAH patients, and gender-matched controls to investigate this variation. Our analysis identified features of UMC iPSC-ECs related to modifiers of BMPR2 signaling or to differentially expressed genes. FPAH-iPSC-ECs showed reduced adhesion, survival, migration, and angiogenesis compared to UMC-iPSC-ECs and control cells. The "rescued" phenotype of UMC cells was related to an increase in specific BMPR2 activators and/or a reduction in inhibitors, and the improved cell adhesion could be attributed to preservation of related signaling. The improved survival was related to increased BIRC3 and was independent of BMPR2. Our findings therefore highlight protective modifiers for FPAH that could help inform development of future treatment strategies.

    View details for DOI 10.1016/j.stem.2016.08.019

    View details for PubMedID 28017794

  • Simul-seq: combined DNA and RNA sequencing for whole-genome and transcriptome profiling. Nature methods Reuter, J. A., Spacek, D. V., Pai, R. K., Snyder, M. P. 2016; 13 (11): 953-958

    Abstract

    Paired DNA and RNA profiling is increasingly employed in genomics research to uncover molecular mechanisms of disease and to explore personal genotype and phenotype correlations. Here, we introduce Simul-seq, a technique for the production of high-quality whole-genome and transcriptome sequencing libraries from small quantities of cells or tissues. We apply the method to laser-capture-microdissected esophageal adenocarcinoma tissue, revealing a highly aneuploid tumor genome with extensive blocks of increased homozygosity and corresponding increases in allele-specific expression. Among this widespread allele-specific expression, we identify germline polymorphisms that are associated with response to cancer therapies. We further leverage this integrative data to uncover expressed mutations in several known cancer genes as well as a recurrent mutation in the motor domain of KIF3B that significantly affects kinesin-microtubule interactions. Simul-seq provides a new streamlined approach for generating comprehensive genome and transcriptome profiles from limited quantities of clinically relevant samples.

    View details for DOI 10.1038/nmeth.4028

    View details for PubMedID 27723755

  • Yeast longevity promoted by reversing aging-associated decline in heavy isotope content. NPJ aging and mechanisms of disease Li, X., Snyder, M. P. 2016; 2: 16004

    Abstract

    Dysregulation of metabolism develops with organismal aging. Both genetic and environmental manipulations promote longevity by effectively diverting various metabolic processes against aging. How these processes converge on the metabolome is not clear. Here we report that the heavy isotopic forms of common elements, a universal feature of metabolites, decline in yeast cells undergoing chronological aging. Supplementation of deuterium, a heavy hydrogen isotope, through heavy water (D2O) uptake extends yeast chronological lifespan (CLS) by up to 85% with minimal effects on growth. The CLS extension by D2O bypasses several known genetic regulators, but is abrogated by calorie restriction and mitochondrial deficiency. Heavy water substantially suppresses endogenous generation of reactive oxygen species (ROS) and slows the pace of metabolic consumption and disposal. Protection from aging by heavy isotopes might result from kinetic modulation of biochemical reactions. Altogether, our findings reveal a novel perspective of aging and new means for promoting longevity.

    View details for DOI 10.1038/npjamd.2016.4

    View details for PubMedID 28721263

    View details for PubMedCentralID PMC5515009

  • Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations NATURE GENETICS Araya, C. L., Cenik, C., Reuters, J. A., Kiss, G., Pande, V. S., Snyder, M. P., Greenleaf, W. J. 2016; 48 (2): 117-125

    Abstract

    Cancer sequencing studies have primarily identified cancer driver genes by the accumulation of protein-altering mutations. An improved method would be annotation independent, sensitive to unknown distributions of functions within proteins and inclusive of noncoding drivers. We employed density-based clustering methods in 21 tumor types to detect variably sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and noncoding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼15% of specific tumor types. SMRs demonstrate spatial clustering of alterations in molecular domains and at interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated across tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest that mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally agnostic driver identification.

    View details for DOI 10.1038/ng.3471

    View details for Web of Science ID 000369043900008

  • Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nature biotechnology Kuleshov, V., Jiang, C., Zhou, W., Jahanbani, F., Batzoglou, S., Snyder, M. 2016; 34 (1): 64-69

    Abstract

    Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.

    View details for DOI 10.1038/nbt.3416

    View details for PubMedID 26655498

    View details for PubMedCentralID PMC4884093

  • Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature communications Yu, K., Zhang, C., Berry, G. J., Altman, R. B., Ré, C., Rubin, D. L., Snyder, M. 2016; 7: 12474-?

    Abstract

    Lung cancer is the most prevalent cancer worldwide, and histopathological assessment is indispensable for its diagnosis. However, human evaluation of pathology slides cannot accurately predict patients' prognoses. In this study, we obtain 2,186 haematoxylin and eosin stained histopathology whole-slide images of lung adenocarcinoma and squamous cell carcinoma patients from The Cancer Genome Atlas (TCGA), and 294 additional images from Stanford Tissue Microarray (TMA) Database. We extract 9,879 quantitative image features and use regularized machine-learning methods to select the top features and to distinguish shorter-term survivors from longer-term survivors with stage I adenocarcinoma (P<0.003) or squamous cell carcinoma (P=0.023) in the TCGA data set. We validate the survival prediction framework with the TMA cohort (P<0.036 for both tumour types). Our results suggest that automatically derived image features can predict the prognosis of lung cancer patients and thereby contribute to precision oncology. Our methods are extensible to histopathology images of other organs.

    View details for DOI 10.1038/ncomms12474

    View details for PubMedID 27527408

  • Identification of Human Neuronal Protein Complexes Reveals Biochemical Activities and Convergent Mechanisms of Action in Autism Spectrum Disorders CELL SYSTEMS Li, J., Ma, Z., Shi, M., Malty, R. H., Aoki, H., Minic, Z., Phanse, S., Jin, K., Wall, D. P., Zhang, Z., Urban, A. E., Hallmayer, J., Babu, M., Snyder, M. 2015; 1 (5): 361-374

    Abstract

    The prevalence of autism spectrum disorders (ASDs) is rapidly growing, yet its molecular basis is poorly understood. We used a systems approach in which ASD candidate genes were mapped onto the ubiquitous human protein complexes and the resulting complexes were characterized. The studies revealed the role of histone deacetylases (HDAC1/2) in regulating the expression of ASD orthologs in the embryonic mouse brain. Proteome-wide screens for the co-complexed subunits with HDAC1 and six other key ASD proteins in neuronal cells revealed a protein interaction network, which displayed preferential expression in fetal brain development, exhibited increased deleterious mutations in ASD cases, and were strongly regulated by FMRP and MECP2 causal for Fragile X and Rett syndromes, respectively. Overall, our study reveals molecular components in ASD, suggests a shared mechanism between the syndromic and idiopathic forms of ASDs, and provides a systems framework for analyzing complex human diseases.

    View details for DOI 10.1016/j.cels.2015.11.002

    View details for Web of Science ID 000209926300009

    View details for PubMedCentralID PMC4776331

  • Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions CELL Grubert, F., Zaugg, J. B., Kasowski, M., Ursu, O., Spacek, D. V., Martin, A. R., Greenside, P., Srivas, R., Phanstiel, D. H., Pekowska, A., Heidari, N., Euskirchen, G., Huber, W., Pritchard, J. K., Bustamante, C. D., Steinmetz, L. M., Kundaje, A., Snyder, M. 2015; 162 (5): 1051-1065

    Abstract

    Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements.

    View details for DOI 10.1016/j.cell.2015.07.048

    View details for Web of Science ID 000360589900015

    View details for PubMedCentralID PMC4556133

  • Recurrent somatic mutations in regulatory regions of human cancer genomes. Nature genetics Melton, C., Reuter, J. A., Spacek, D. V., Snyder, M. 2015; 47 (7): 710-716

    Abstract

    Aberrant regulation of gene expression in cancer can promote survival and proliferation of cancer cells. Here we integrate whole-genome sequencing data from The Cancer Genome Atlas (TCGA) for 436 patients from 8 cancer subtypes with ENCODE and other regulatory annotations to identify point mutations in regulatory regions. We find evidence for positive selection of mutations in transcription factor binding sites, consistent with these sites regulating important cancer cell functions. Using a new method that adjusts for sample- and genomic locus-specific mutation rates, we identify recurrently mutated sites across individuals with cancer. Mutated regulatory sites include known sites in the TERT promoter and many new sites, including a subset in proximity to cancer-related genes. In reporter assays, two new sites display decreased enhancer activity upon mutation. These data demonstrate that many regulatory regions contain mutations under selective pressure and suggest a greater role for regulatory mutations in cancer than previously appreciated.

    View details for DOI 10.1038/ng.3332

    View details for PubMedID 26053494

    View details for PubMedCentralID PMC4485503

  • Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events NATURE BIOTECHNOLOGY Tilgner, H., Jahanbani, F., Blauwkamp, T., Moshrefi, A., Jaeger, E., Chen, F., Harel, I., Bustamante, C. D., Rasmussen, M., Snyder, M. P. 2015; 33 (7): 736-742

    Abstract

    Alternative splicing shapes mammalian transcriptomes, with many RNA molecules undergoing multiple distant alternative splicing events. Comprehensive transcriptome analysis, including analysis of exon co-association in the same molecule, requires deep, long-read sequencing. Here we introduce an RNA sequencing method, synthetic long-read RNA sequencing (SLR-RNA-seq), in which small pools (≤1,000 molecules/pool, ≤1 molecule/gene for most genes) of full-length cDNAs are amplified, fragmented and short-read-sequenced. We demonstrate that these RNA sequences reconstructed from the short reads from each of the pools are mostly close to full length and contain few insertion and deletion errors. We report many previously undescribed isoforms (human brain: ∼13,800 affected genes, 14.5% of molecules; mouse brain ∼8,600 genes, 18% of molecules) and up to 165 human distant molecularly associated exon pairs (dMAPs) and distant molecularly and mutually exclusive pairs (dMEPs). Of 16 associated pairs detected in the mouse brain, 9 are conserved in human. Our results indicate conserved mechanisms that can produce distant but phased features on transcript and proteome isoforms.

    View details for DOI 10.1038/nbt.3242

    View details for Web of Science ID 000358396100029

  • Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events. Nature biotechnology Tilgner, H., Jahanbani, F., Blauwkamp, T., Moshrefi, A., Jaeger, E., Chen, F., Harel, I., Bustamante, C. D., Rasmussen, M., Snyder, M. P. 2015

    Abstract

    Alternative splicing shapes mammalian transcriptomes, with many RNA molecules undergoing multiple distant alternative splicing events. Comprehensive transcriptome analysis, including analysis of exon co-association in the same molecule, requires deep, long-read sequencing. Here we introduce an RNA sequencing method, synthetic long-read RNA sequencing (SLR-RNA-seq), in which small pools (≤1,000 molecules/pool, ≤1 molecule/gene for most genes) of full-length cDNAs are amplified, fragmented and short-read-sequenced. We demonstrate that these RNA sequences reconstructed from the short reads from each of the pools are mostly close to full length and contain few insertion and deletion errors. We report many previously undescribed isoforms (human brain: ∼13,800 affected genes, 14.5% of molecules; mouse brain ∼8,600 genes, 18% of molecules) and up to 165 human distant molecularly associated exon pairs (dMAPs) and distant molecularly and mutually exclusive pairs (dMEPs). Of 16 associated pairs detected in the mouse brain, 9 are conserved in human. Our results indicate conserved mechanisms that can produce distant but phased features on transcript and proteome isoforms.

    View details for DOI 10.1038/nbt.3242

    View details for PubMedID 25985263

  • Comparison of the transcriptional landscapes between human and mouse tissues PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Lin, S., Lin, Y., Nery, J. R., Urich, M. A., Breschi, A., Davis, C. A., Dobin, A., Zaleski, C., Beer, M. A., Chapman, W. C., Gingeras, T. R., Ecker, J. R., Snyder, M. P. 2014; 111 (48): 17224-17229

    Abstract

    Although the similarities between humans and mice are typically highlighted, morphologically and genetically, there are many differences. To better understand these two species on a molecular level, we performed a comparison of the expression profiles of 15 tissues by deep RNA sequencing and examined the similarities and differences in the transcriptome for both protein-coding and -noncoding transcripts. Although commonalities are evident in the expression of tissue-specific genes between the two species, the expression for many sets of genes was found to be more similar in different tissues within the same species than between species. These findings were further corroborated by associated epigenetic histone mark analyses. We also find that many noncoding transcripts are expressed at a low level and are not detectable at appreciable levels across individuals. Moreover, the majority lack obvious sequence homologs between species, even when we restrict our attention to those which are most highly reproducible across biological replicates. Overall, our results indicate that there is considerable RNA expression diversity between humans and mice, well beyond what was described previously, likely reflecting the fundamental physiological differences between these two organisms.

    View details for DOI 10.1073/pnas.1413624111

    View details for Web of Science ID 000345920800059

    View details for PubMedID 25413365

    View details for PubMedCentralID PMC4260565

  • Principles of regulatory information conservation between mouse and human. Nature Cheng, Y., Ma, Z., Kim, B. H., Wu, W., Cayting, P., Boyle, A. P., Sundaram, V., Xing, X., Dogan, N., Li, J., Euskirchen, G., Lin, S., Lin, Y., Visel, A., Kawli, T., Yang, X., Patacsil, D., Keller, C. A., Giardine, B., Kundaje, A., Wang, T., Pennacchio, L. A., Weng, Z., Hardison, R. C., Snyder, M. P. 2014; 515 (7527): 371-5

    Abstract

    To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.

    View details for DOI 10.1038/nature13985

    View details for PubMedID 25409826

    View details for PubMedCentralID PMC4343047

  • Regulatory analysis of the C. elegans genome with spatiotemporal resolution. Nature Araya, C. L., Kawli, T., Kundaje, A., Jiang, L., Wu, B., Vafeados, D., Terrell, R., Weissdepp, P., Gevirtzman, L., Mace, D., Niu, W., Boyle, A. P., Xie, D., Ma, L., Murray, J. I., Reinke, V., Waterston, R. H., Snyder, M. 2014; 512 (7515): 400-405

    View details for DOI 10.1038/nature13497

    View details for PubMedID 25164749

  • Comparative analysis of regulatory information and circuits across distant species. Nature Boyle, A. P., Araya, C. L., Brdlik, C., Cayting, P., Cheng, C., Cheng, Y., Gardner, K., Hillier, L. W., Janette, J., Jiang, L., Kasper, D., Kawli, T., Kheradpour, P., Kundaje, A., Li, J. J., Ma, L., Niu, W., Rehm, E. J., Rozowsky, J., Slattery, M., Spokony, R., Terrell, R., Vafeados, D., Wang, D., Weisdepp, P., Wu, Y., Xie, D., Yan, K., Feingold, E. A., Good, P. J., Pazin, M. J., Huang, H., Bickel, P. J., Brenner, S. E., Reinke, V., Waterston, R. H., Gerstein, M., White, K. P., Kellis, M., Snyder, M. 2014; 512 (7515): 453-456

    View details for DOI 10.1038/nature13668

    View details for PubMedID 25164757

  • Defining a personal, allele-specific, and single-molecule long-read transcriptome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Tilgner, H., Grubert, F., Sharon, D., Snyder, M. P. 2014; 111 (27): 9869-9874

    Abstract

    Personal transcriptomes in which all of an individual's genetic variants (e.g., single nucleotide variants) and transcript isoforms (transcription start sites, splice sites, and polyA sites) are defined and quantified for full-length transcripts are expected to be important for understanding individual biology and disease, but have not been described previously. To obtain such transcriptomes, we sequenced the lymphoblastoid transcriptomes of three family members (GM12878 and the parents GM12891 and GM12892) by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and made the following observations. First, we found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes ≤3 kb and often for genes longer than that. Second, we added and quantified previously unidentified splicing isoforms to an existing annotation, thus creating the first personalized annotation to our knowledge. Third, we determined SNVs in a de novo manner and connected them to RNA haplotypes, including HLA haplotypes, thereby assigning single full-length RNA molecules to their transcribed allele, and demonstrated Mendelian inheritance of RNA molecules. Fourth, we show how RNA molecules can be linked to personal variants on a one-by-one basis, which allows us to assess differential allelic expression (DAE) and differential allelic isoforms (DAI) from the phased full-length isoform reads. The DAI method is largely independent of the distance between exon and SNV-in contrast to fragmentation-based methods. Overall, in addition to improving eukaryotic transcriptome annotation, these results describe, to our knowledge, the first large-scale and full-length personal transcriptome.

    View details for DOI 10.1073/pnas.1400447111

    View details for Web of Science ID 000338514800044

    View details for PubMedCentralID PMC4103364

  • Clinical interpretation and implications of whole-genome sequencing. JAMA : the journal of the American Medical Association Dewey, F. E., Grove, M. E., Pan, C., Goldstein, B. A., Bernstein, J. A., Chaib, H., Merker, J. D., Goldfeder, R. L., Enns, G. M., David, S. P., Pakdaman, N., Ormond, K. E., Caleshu, C., Kingham, K., Klein, T. E., Whirl-Carrillo, M., Sakamoto, K., Wheeler, M. T., Butte, A. J., Ford, J. M., Boxer, L., Ioannidis, J. P., Yeung, A. C., Altman, R. B., Assimes, T. L., Snyder, M., Ashley, E. A., Quertermous, T. 2014; 311 (10): 1035-1045

    Abstract

    Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication.To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings.An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings.Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up.Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P < 001).In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine.

    View details for DOI 10.1001/jama.2014.1717

    View details for PubMedID 24618965

    View details for PubMedCentralID PMC4119063

  • Divergence in a master variator generates distinct phenotypes and transcriptional responses GENES & DEVELOPMENT Gallagher, J. E., Zheng, W., Rong, X., Miranda, N., Lin, Z., Dunn, B., Zhao, H., Snyder, M. P. 2014; 28 (4): 409-421

    Abstract

    Genetic basis of phenotypic differences in individuals is an important area in biology and personalized medicine. Analysis of divergent Saccharomyces cerevisiae strains grown under different conditions revealed extensive variation in response to both drugs (e.g., 4-nitroquinoline 1-oxide [4NQO]) and different carbon sources. Differences in 4NQO resistance were due to amino acid variation in the transcription factor Yrr1. Yrr1(YJM789) conferred 4NQO resistance but caused slower growth on glycerol, and vice versa with Yrr1(S96), indicating that alleles of Yrr1 confer distinct phenotypes. The binding targets of Yrr1 alleles from diverse yeast strains varied considerably among different strains grown under the same conditions as well as for the same strain under different conditions, indicating that distinct molecular programs are conferred by the different Yrr1 alleles. Our results demonstrate that genetic variations in one important control gene (YRR1), lead to distinct regulatory programs and phenotypes in individuals. We term these polymorphic control genes "master variators."

    View details for DOI 10.1101/gad.228940.113

    View details for Web of Science ID 000331616100009

    View details for PubMedID 24532717

    View details for PubMedCentralID PMC3937518

  • Integrated systems analysis reveals a molecular network underlying autism spectrum disorders. Molecular systems biology Li, J., Shi, M., Ma, Z., Zhao, S., Euskirchen, G., Ziskin, J., Urban, A., Hallmayer, J., Snyder, M. 2014; 10: 774-?

    Abstract

    Autism is a complex disease whose etiology remains elusive. We integrated previously and newly generated data and developed a systems framework involving the interactome, gene expression and genome sequencing to identify a protein interaction module with members strongly enriched for autism candidate genes. Sequencing of 25 patients confirmed the involvement of this module in autism, which was subsequently validated using an independent cohort of over 500 patients. Expression of this module was dichotomized with a ubiquitously expressed subcomponent and another subcomponent preferentially expressed in the corpus callosum, which was significantly affected by our identified mutations in the network center. RNA-sequencing of the corpus callosum from patients with autism exhibited extensive gene mis-expression in this module, and our immunochemical analysis showed that the human corpus callosum is predominantly populated by oligodendrocyte cells. Analysis of functional genomic data further revealed a significant involvement of this module in the development of oligodendrocyte cells in mouse brain. Our analysis delineates a natural network involved in autism, helps uncover novel candidate genes for this disease and improves our understanding of its molecular pathology.

    View details for DOI 10.15252/msb.20145487

    View details for PubMedID 25549968

    View details for PubMedCentralID PMC4300495

  • Integrated systems analysis reveals a molecular network underlying autism spectrum disorders. Molecular systems biology Li, J., Shi, M., Ma, Z., Zhao, S., Euskirchen, G., Ziskin, J., Urban, A., Hallmayer, J., Snyder, M. 2014; 10 (12): 774-?

    View details for DOI 10.15252/msb.20145487

    View details for PubMedID 25549968

  • Extensive Variation in Chromatin States Across Humans SCIENCE Kasowski, M., Kyriazopoulou-Panagiotopoulou, S., Grubert, F., Zaugg, J. B., Kundaje, A., Liu, Y., Boyle, A. P., Zhang, Q. C., Zakharia, F., Spacek, D. V., Li, J., Xie, D., Olarerin-George, A., Steinmetz, L. M., Hogenesch, J. B., Kellis, M., Batzoglou, S., Snyder, M. 2013; 342 (6159): 750-752

    Abstract

    The majority of disease-associated variants lie outside protein-coding regions, suggesting a link between variation in regulatory regions and disease predisposition. We studied differences in chromatin states using five histone modifications, cohesin, and CTCF in lymphoblastoid lines from 19 individuals of diverse ancestry. We found extensive signal variation in regulatory regions, which often switch between active and repressed states across individuals. Enhancer activity is particularly diverse among individuals, whereas gene expression remains relatively stable. Chromatin variability shows genetic inheritance in trios, correlates with genetic variation and population divergence, and is associated with disruptions of transcription factor binding motifs. Overall, our results provide insights into chromatin variation among humans.

    View details for DOI 10.1126/science.1242510

    View details for PubMedID 24136358

  • A single-molecule long-read survey of the human transcriptome. Nature biotechnology Sharon, D., Tilgner, H., Grubert, F., Snyder, M. 2013; 31 (11): 1009-1014

    Abstract

    Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5' to 3' end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5' ends. For longer RNA molecules more 5' nucleotides are missing, but complete intron structures are often preserved. In total, we identify ∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.

    View details for DOI 10.1038/nbt.2705

    View details for PubMedID 24108091

  • Dynamic trans-Acting Factor Colocalization in Human Cells CELL Xie, D., Boyle, A. P., Wu, L., Zhai, J., Kawli, T., Snyder, M. 2013; 155 (3): 713-724

    Abstract

    Different trans-acting factors (TFs) collaborate and act in concert at distinct loci to perform accurate regulation of their target genes. To date, the cobinding of TF pairs has been investigated in a limited context both in terms of the number of factors within a cell type and across cell types and the extent of combinatorial colocalizations. Here, we use an approach to analyze TF colocalization within a cell type and across multiple cell lines at an unprecedented level. We extend this approach with large-scale mass spectrometry analysis of immunoprecipitations of 50 TFs. Our combined approach reveals large numbers of interesting TF-TF associations. We observe extensive change in TF colocalizations both within a cell type exposed to different conditions and across multiple cell types. We show distinct functional annotations and properties of different TF cobinding patterns and provide insights into the complex regulatory landscape of the cell.

    View details for DOI 10.1016/j.cell.2013.09.043

    View details for Web of Science ID 000326571800023

    View details for PubMedID 24243024

  • Whole-exome sequencing identifies tetratricopeptide repeat domain 7A (TTC7A) mutations for combined immunodeficiency with intestinal atresias. journal of allergy and clinical immunology Chen, R., Giliani, S., Lanzi, G., Mias, G. I., Lonardi, S., Dobbs, K., Manis, J., Im, H., Gallagher, J. E., Phanstiel, D. H., Euskirchen, G., Lacroute, P., Bettinger, K., Moratto, D., Weinacht, K., Montin, D., Gallo, E., Mangili, G., Porta, F., Notarangelo, L. D., Pedretti, S., Al-Herz, W., Alfahdli, W., Comeau, A. M., Traister, R. S., Pai, S., Carella, G., Facchetti, F., Nadeau, K. C., Snyder, M., Notarangelo, L. D. 2013; 132 (3): 656-664 e17

    Abstract

    Combined immunodeficiency with multiple intestinal atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects.We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequences of 5 patients and their healthy direct relatives from 5 unrelated families.We performed whole-exome sequencing on 5 patients with CID-MIA and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene tetratricopeptide repeat domain 7A (TTC7A) on 3 additional patients with CID-MIA.Through analysis and comparison of the exomic sequence of the subjects from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in 2 of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells, as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from 2 patients with CID-MIA.We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.

    View details for DOI 10.1016/j.jaci.2013.06.013

    View details for PubMedID 23830146

  • Whole-exome sequencing identifies tetratricopeptide repeat domain 7A (TTC7A) mutations for combined immunodeficiency with intestinal atresias JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY Chen, R., Giliani, S., Lanzi, G., Mias, G. I., Lonardi, S., Dobbs, K., Manis, J., Im, H., Gallagher, J. E., Phanstiel, D. H., Euskirchen, G., Lacroute, P., Bettinger, K., Moratto, D., Weinacht, K., Montin, D., Gallo, E., Mangili, G., Porta, F., Notarangelo, L. D., Pedretti, S., Al-Herz, W., Alfahdli, W., Comeau, A. M., Traister, R. S., Pai, S., Carella, G., Facchetti, F., Nadeau, K. C., Snyder, M., Notarangelo, L. D. 2013; 132 (3): 656-?

    Abstract

    Combined immunodeficiency with multiple intestinal atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects.We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequences of 5 patients and their healthy direct relatives from 5 unrelated families.We performed whole-exome sequencing on 5 patients with CID-MIA and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene tetratricopeptide repeat domain 7A (TTC7A) on 3 additional patients with CID-MIA.Through analysis and comparison of the exomic sequence of the subjects from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in 2 of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells, as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from 2 patients with CID-MIA.We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.

    View details for DOI 10.1016/j.jaci.2013.06.013

    View details for Web of Science ID 000323612000018

    View details for PubMedID 23830146

  • Systematic functional regulatory assessment of disease-associated variants. Proceedings of the National Academy of Sciences of the United States of America Karczewski, K. J., Dudley, J. T., Kukurba, K. R., Chen, R., Butte, A. J., Montgomery, S. B., Snyder, M. 2013; 110 (23): 9607-9612

    Abstract

    Genome-wide association studies have discovered many genetic loci associated with disease traits, but the functional molecular basis of these associations is often unresolved. Genome-wide regulatory and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation and may allow for functional assessment of disease-associated loci. Here, we present a unique approach for systematic integration of genetic disease associations, transcription factor binding among individuals, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In an analysis of genome-wide binding profiles of NFκB, we find that disease-associated SNPs are enriched in NFκB binding regions overall, and specifically for inflammatory-mediated diseases, such as asthma, rheumatoid arthritis, and coronary artery disease. Using genome-wide variation in transcription factor-binding data, we find that NFκB binding is often correlated with disease-associated variants in a genotype-specific and allele-specific manner. Furthermore, we show that this binding variation is often related to expression of nearby genes, which are also found to have altered expression in independent profiling of the variant-associated disease condition. Thus, using this integrative approach, we provide a unique means to assign putative function to many disease-associated SNPs.

    View details for DOI 10.1073/pnas.1219099110

    View details for PubMedID 23690573

  • Specific plasma autoantibody reactivity in myelodysplastic syndromes. Scientific reports Mias, G. I., Chen, R., Zhang, Y., Sridhar, K., Sharon, D., Xiao, L., Im, H., Snyder, M. P., Greenberg, P. L. 2013; 3: 3311-?

    Abstract

    Increased autoantibody reactivity in plasma from Myelodysplastic Syndromes (MDS) patients may provide novel disease signatures, and possible early detection. In a two-stage study we investigated Immunoglobulin G reactivity in plasma from MDS, Acute Myeloid Leukemia post MDS patients, and a healthy cohort. In exploratory Stage I we utilized high-throughput protein arrays to identify 35 high-interest proteins showing increased reactivity in patient subgroups compared to healthy controls. In validation Stage II we designed new arrays focusing on 25 of the proteins identified in Stage I and expanded the initial cohort. We validated increased antibody reactivity against AKT3, FCGR3A and ARL8B in patients, which enabled sample classification into stable MDS and healthy individuals. We also detected elevated AKT3 protein levels in MDS patient plasma. The discovery of increased specific autoantibody reactivity in MDS patients, provides molecular signatures for classification, supplementing existing risk categorizations, and may enhance diagnostic and prognostic capabilities for MDS.

    View details for DOI 10.1038/srep03311

    View details for PubMedID 24264604

  • Extensive genetic variation in somatic human tissues PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA O'Huallachain, M., Karczewski, K. J., Weissman, S. M., Urban, A. E., Snyder, M. P. 2012; 109 (44): 18018-18023

    Abstract

    Genetic variation between individuals has been extensively investigated, but differences between tissues within individuals are far less understood. It is commonly assumed that all healthy cells that arise from the same zygote possess the same genomic content, with a few known exceptions in the immune system and germ line. However, a growing body of evidence shows that genomic variation exists between differentiated tissues. We investigated the scope of somatic genomic variation between tissues within humans. Analysis of copy number variation by high-resolution array-comparative genomic hybridization in diverse tissues from six unrelated subjects reveals a significant number of intraindividual genomic changes between tissues. Many (79%) of these events affect genes. Our results have important consequences for understanding normal genetic and phenotypic variation within individuals, and they have significant implications for both the etiology of genetic diseases such as cancer and for immortalized cell lines that might be used in research and therapeutics.

    View details for DOI 10.1073/pnas.1213736109

    View details for Web of Science ID 000311149900070

    View details for PubMedID 23043118

    View details for PubMedCentralID PMC3497787

  • An integrated encyclopedia of DNA elements in the human genome NATURE Dunham, I., Kundaje, A., Aldred, S. F., Collins, P. J., Davis, C., Doyle, F., Epstein, C. B., Frietze, S., Harrow, J., Kaul, R., Khatun, J., Lajoie, B. R., Landt, S. G., Lee, B., Pauli, F., Rosenbloom, K. R., Sabo, P., Safi, A., Sanyal, A., Shoresh, N., Simon, J. M., Song, L., Trinklein, N. D., Altshuler, R. C., Birney, E., Brown, J. B., Cheng, C., Djebali, S., Dong, X., Dunham, I., Ernst, J., Furey, T. S., Gerstein, M., Giardine, B., Greven, M., Hardison, R. C., Harris, R. S., Herrero, J., Hoffman, M. M., Iyer, S., Kellis, M., Khatun, J., Kheradpour, P., Kundaje, A., Lassmann, T., Li, Q., Lin, X., Marinov, G. K., Merkel, A., Mortazavi, A., Parker, S. C., Reddy, T. E., Rozowsky, J., Schlesinger, F., Thurman, R. E., Wang, J., Ward, L. D., Whitfield, T. W., Wilder, S. P., Wu, W., Xi, H. S., Yip, K. Y., Zhuang, J., Bernstein, B. E., Birney, E., Dunham, I., Green, E. D., Gunter, C., Snyder, M., Pazin, M. J., Lowdon, R. F., Dillon, L. A., Adams, L. B., Kelly, C. J., Zhang, J., Wexler, J. R., Green, E. D., Good, P. J., Feingold, E. A., Bernstein, B. E., Birney, E., Crawford, G. E., Dekker, J., Elnitski, L., Farnham, P. J., Gerstein, M., Giddings, M. C., Gingeras, T. R., Green, E. D., Guigo, R., Hardison, R. C., Hubbard, T. J., Kellis, M., Kent, W. J., Lieb, J. D., Margulies, E. H., Myers, R. M., Snyder, M., Stamatoyannopoulos, J. A., Tenenbaum, S. A., Weng, Z., White, K. P., Wold, B., Khatun, J., Yu, Y., Wrobel, J., Risk, B. A., Gunawardena, H. P., Kuiper, H. C., Maier, C. W., Xie, L., Chen, X., Giddings, M. C., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Kheradpour, P., Mikkelsen, T. S., Gillespie, S., Goren, A., Ram, O., Zhang, X., Wang, L., Issner, R., Coyne, M. J., Durham, T., Ku, M., Truong, T., Ward, L. D., Altshuler, R. C., Eaton, M. L., Kellis, M., Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., Xue, C., Marinov, G. K., Khatun, J., Williams, B. A., Zaleski, C., Rozowsky, J., Roeder, M., Kokocinski, F., Abdelhamid, R. F., Alioto, T., Antoshechkin, I., Baer, M. T., Batut, P., Bell, I., Bell, K., Chakrabortty, S., Chen, X., Chrast, J., Curado, J., Derrien, T., Drenkow, J., Dumais, E., Dumais, J., Duttagupta, R., Fastuca, M., Fejes-Toth, K., Ferreira, P., Foissac, S., Fullwood, M. J., Gao, H., Gonzalez, D., Gordon, A., Gunawardena, H. P., Howald, C., Jha, S., Johnson, R., Kapranov, P., King, B., Kingswood, C., Li, G., Luo, O. J., Park, E., Preall, J. B., Presaud, K., Ribeca, P., Risk, B. A., Robyr, D., Ruan, X., Sammeth, M., Sandhu, K. S., Schaeffer, L., See, L., Shahab, A., Skancke, J., Suzuki, A. M., Takahashi, H., Tilgner, H., Trout, D., Walters, N., Wang, H., Wrobel, J., Yu, Y., Hayashizaki, Y., Harrow, J., Gerstein, M., Hubbard, T. J., Reymond, A., Antonarakis, S. E., Hannon, G. J., Giddings, M. C., Ruan, Y., Wold, B., Carninci, P., Guigo, R., Gingeras, T. R., Rosenbloom, K. R., Sloan, C. A., Learned, K., Malladi, V. S., Wong, M. C., Barber, G., Cline, M. S., Dreszer, T. R., Heitner, S. G., Karolchik, D., Kent, W. J., Kirkup, V. M., Meyer, L. R., Long, J. C., Maddren, M., Raney, B. J., Furey, T. S., Song, L., Grasfeder, L. L., Giresi, P. G., Lee, B., Battenhouse, A., Sheffield, N. C., Simon, J. M., Showers, K. A., Safi, A., London, D., Bhinge, A. A., Shestak, C., Schaner, M. R., Kim, S. K., Zhang, Z. Z., Mieczkowski, P. A., Mieczkowska, J. O., Liu, Z., McDaniell, R. M., Ni, Y., Rashid, N. U., Kim, M. J., Adar, S., Zhang, Z., Wang, T., Winter, D., Keefe, D., Birney, E., Iyer, V. R., Lieb, J. D., Crawford, G. E., Li, G., Sandhu, K. S., Zheng, M., Wang, P., Luo, O. J., Shahab, A., Fullwood, M. J., Ruan, X., Ruan, Y., Myers, R. M., Pauli, F., Williams, B. A., Gertz, J., Marinov, G. K., Reddy, T. E., Vielmetter, J., Partridge, E. C., Trout, D., Varley, K. E., Gasper, C., Bansal, A., Pepke, S., Jain, P., Amrhein, H., Bowling, K. M., Anaya, M., Cross, M. K., King, B., Muratet, M. A., Antoshechkin, I., Newberry, K. M., McCue, K., Nesmith, A. S., Fisher-Aylor, K. I., Pusey, B., DeSalvo, G., Parker, S. L., Balasubramanian, S., Davis, N. S., Meadows, S. K., Eggleston, T., Gunter, C., Newberry, J. S., Levy, S. E., Absher, D. M., Mortazavi, A., Wong, W. H., Wold, B., Blow, M. J., Visel, A., Pennachio, L. A., Elnitski, L., Margulies, E. H., Parker, S. C., Petrykowska, H. M., Abyzov, A., Aken, B., Barrell, D., Barson, G., Berry, A., Bignell, A., Boychenko, V., Bussotti, G., Chrast, J., Davidson, C., Derrien, T., Despacio-Reyes, G., Diekhans, M., Ezkurdia, I., Frankish, A., Gilbert, J., Gonzalez, J. M., Griffiths, E., Harte, R., Hendrix, D. A., Howald, C., Hunt, T., Jungreis, I., Kay, M., Khurana, E., Kokocinski, F., Leng, J., Lin, M. F., Loveland, J., Lu, Z., Manthravadi, D., Mariotti, M., Mudge, J., Mukherjee, G., Notredame, C., Pei, B., Rodriguez, J. M., Saunders, G., Sboner, A., Searle, S., Sisu, C., Snow, C., Steward, C., Tanzer, A., Tapanari, E., Tress, M. L., van Baren, M. J., Walters, N., Washietl, S., Wilming, L., Zadissa, A., Zhang, Z., Brent, M., Haussler, D., Kellis, M., Valencia, A., Gerstein, M., Reymond, A., Guigo, R., Harrow, J., Hubbard, T. J., Landt, S. G., Frietze, S., Abyzov, A., Addleman, N., Alexander, R. P., Auerbach, R. K., Balasubramanian, S., Bettinger, K., Bhardwaj, N., Boyle, A. P., Cao, A. R., Cayting, P., Charos, A., Cheng, Y., Cheng, C., Eastman, C., Euskirchen, G., Fleming, J. D., Grubert, F., Habegger, L., Hariharan, M., Harmanci, A., Iyengar, S., Jin, V. X., Karczewski, K. J., Kasowski, M., Lacroute, P., Lam, H., Lamarre-Vincent, N., Leng, J., Lian, J., Lindahl-Allen, M., Min, R., Miotto, B., Monahan, H., Moqtaderi, Z., Mu, X. J., O'Geen, H., Ouyang, Z., Patacsil, D., Pei, B., Raha, D., Ramirez, L., Reed, B., Rozowsky, J., Sboner, A., Shi, M., Sisu, C., Slifer, T., Witt, H., Wu, L., Xu, X., Yan, K., Yang, X., Yip, K. Y., Zhang, Z., Struhl, K., Weissman, S. M., Gerstein, M., Farnham, P. J., Snyder, M., Tenenbaum, S. A., Penalva, L. O., Doyle, F., Karmakar, S., Landt, S. G., Bhanvadia, R. R., Choudhury, A., Domanus, M., Ma, L., Moran, J., Patacsil, D., Slifer, T., Victorsen, A., Yang, X., Snyder, M., White, K. P., Auer, T., Centanin, L., Eichenlaub, M., Gruhl, F., Heermann, S., Hoeckendorf, B., Inoue, D., Kellner, T., Kirchmaier, S., Mueller, C., Reinhardt, R., Schertel, L., Schneider, S., Sinn, R., Wittbrodt, B., Wittbrodt, J., Weng, Z., Whitfield, T. W., Wang, J., Collins, P. J., Aldred, S. F., Trinklein, N. D., Partridge, E. C., Myers, R. M., Dekker, J., Jain, G., Lajoie, B. R., Sanyal, A., Balasundaram, G., Bates, D. L., Byron, R., Canfield, T. K., Diegel, M. J., Dunn, D., Ebersol, A. K., Frum, T., Garg, K., Gist, E., Hansen, R. S., Boatman, L., Haugen, E., Humbert, R., Jain, G., Johnson, A. K., Johnson, E. M., Kutyavin, T. V., Lajoie, B. R., Lee, K., Lotakis, D., Maurano, M. T., Neph, S. J., Neri, F. V., Nguyen, E. D., Qu, H., Reynolds, A. P., Roach, V., Rynes, E., Sabo, P., Sanchez, M. E., Sandstrom, R. S., Sanyal, A., Shafer, A. O., Stergachis, A. B., Thomas, S., Thurman, R. E., Vernot, B., Vierstra, J., Vong, S., Wang, H., Weaver, M. A., Yan, Y., Zhang, M., Akey, J. M., Bender, M., Dorschner, M. O., Groudine, M., MacCoss, M. J., Navas, P., Stamatoyannopoulos, G., Kaul, R., Dekker, J., Stamatoyannopoulos, J. A., Dunham, I., Beal, K., Brazma, A., Flicek, P., Herrero, J., Johnson, N., Keefe, D., Lukk, M., Luscombe, N. M., Sobral, D., Vaquerizas, J. M., Wilder, S. P., Batzoglou, S., Sidow, A., Hussami, N., Kyriazopoulou-Panagiotopoulou, S., Libbrecht, M. W., Schaub, M. A., Kundaje, A., Hardison, R. C., Miller, W., Giardine, B., Harris, R. S., Wu, W., Bickel, P. J., Banfai, B., Boley, N. P., Brown, J. B., Huang, H., Li, Q., Li, J. J., Noble, W. S., Bilmes, J. A., Buske, O. J., Hoffman, M. M., Sahu, A. D., Kharchenko, P. V., Park, P. J., Baker, D., Taylor, J., Weng, Z., Iyer, S., Dong, X., Greven, M., Lin, X., Wang, J., Xi, H. S., Zhuang, J., Gerstein, M., Alexander, R. P., Balasubramanian, S., Cheng, C., Harmanci, A., Lochovsky, L., Min, R., Mu, X. J., Rozowsky, J., Yan, K., Yip, K. Y., Birney, E. 2012; 489 (7414): 57-74

    Abstract

    The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

    View details for DOI 10.1038/nature11247

    View details for Web of Science ID 000308347000039

    View details for PubMedID 22955616

    View details for PubMedCentralID PMC3439153

  • Architecture of the human regulatory network derived from ENCODE data NATURE Gerstein, M. B., Kundaje, A., Hariharan, M., Landt, S. G., Yan, K., Cheng, C., Mu, X. J., Khurana, E., Rozowsky, J., Alexander, R., Min, R., Alves, P., Abyzov, A., Addleman, N., Bhardwaj, N., Boyle, A. P., Cayting, P., Charos, A., Chen, D. Z., Cheng, Y., Clarke, D., Eastman, C., Euskirchen, G., Frietze, S., Fu, Y., Gertz, J., Grubert, F., Harmanci, A., Jain, P., Kasowski, M., Lacroute, P., Leng, J., Lian, J., Monahan, H., O'Geen, H., Ouyang, Z., Partridge, E. C., Patacsil, D., Pauli, F., Raha, D., Ramirez, L., Reddy, T. E., Reed, B., Shi, M., Slifer, T., Wang, J., Wu, L., Yang, X., Yip, K. Y., Zilberman-Schapira, G., Batzoglou, S., Sidow, A., Farnham, P. J., Myers, R. M., Weissman, S. M., Snyder, M. 2012; 489 (7414): 91-100

    Abstract

    Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.

    View details for DOI 10.1038/nature11245

    View details for PubMedID 22955619

  • Linking disease associations with regulatory information in the human genome GENOME RESEARCH Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S., Snyder, M. 2012; 22 (9): 1748-1759

    Abstract

    Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.

    View details for DOI 10.1101/gr.136127.111

    View details for PubMedID 22955986

  • Annotation of functional variation in personal genomes using RegulomeDB GENOME RESEARCH Boyle, A. P., Hong, E. L., Hariharan, M., Cheng, Y., Schaub, M. A., Kasowski, M., Karczewski, K. J., Park, J., Hitz, B. C., Weng, S., Cherry, J. M., Snyder, M. 2012; 22 (9): 1790-1797

    Abstract

    As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.

    View details for DOI 10.1101/gr.137323.112

    View details for PubMedID 22955989

  • ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia GENOME RESEARCH Landt, S. G., Marinov, G. K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B. E., Bickel, P., Brown, J. B., Cayting, P., Chen, Y., DeSalvo, G., Epstein, C., Fisher-Aylor, K. I., Euskirchen, G., Gerstein, M., Gertz, J., Hartemink, A. J., Hoffman, M. M., Iyer, V. R., Jung, Y. L., Karmakar, S., Kellis, M., Kharchenko, P. V., Li, Q., Liu, T., Liu, X. S., Ma, L., Milosavljevic, A., Myers, R. M., Park, P. J., Pazin, M. J., Perry, M. D., Raha, D., Reddy, T. E., Rozowsky, J., Shoresh, N., Sidow, A., Slattery, M., Stamatoyannopoulos, J. A., Tolstorukov, M. Y., White, K. P., Xi, S., Farnham, P. J., Lieb, J. D., Wold, B. J., Snyder, M. 2012; 22 (9): 1813-1831

    Abstract

    Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.

    View details for DOI 10.1101/gr.136184.111

    View details for PubMedID 22955991

  • Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes CELL Chen, R., Mias, G. I., Li-Pook-Than, J., Jiang, L., Lam, H. Y., Chen, R., Miriami, E., Karczewski, K. J., Hariharan, M., Dewey, F. E., Cheng, Y., Clark, M. J., Im, H., Habegger, L., Balasubramanian, S., O'Huallachain, M., Dudley, J. T., Hillenmeyer, S., Haraksingh, R., Sharon, D., Euskirchen, G., Lacroute, P., Bettinger, K., Boyle, A. P., Kasowski, M., Grubert, F., Seki, S., Garcia, M., Whirl-Carrillo, M., Gallardo, M., Blasco, M. A., Greenberg, P. L., Snyder, P., Klein, T. E., Altman, R. B., Butte, A. J., Ashley, E. A., Gerstein, M., Nadeau, K. C., Tang, H., Snyder, M. 2012; 148 (6): 1293-1307

    Abstract

    Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.

    View details for DOI 10.1016/j.cell.2012.02.009

    View details for PubMedID 22424236

  • Detecting and annotating genetic variations using the HugeSeq pipeline NATURE BIOTECHNOLOGY Lam, H. Y., Pan, C., Clark, M. J., Lacroute, P., Chen, R., Haraksingh, R., O'Huallachain, M., Gerstein, M. B., Kidd, J. M., Bustamante, C. D., Snyder, M. 2012; 30 (3): 226-229

    View details for Web of Science ID 000301303800013

    View details for PubMedID 22398614

  • Extensive Promoter-Centered Chromatin Interactions Provide a Topological Basis for Transcription Regulation CELL Li, G., Ruan, X., Auerbach, R. K., Sandhu, K. S., Zheng, M., Wang, P., Poh, H. M., Goh, Y., Lim, J., Zhang, J., Sim, H. S., Peh, S. Q., Mulawadi, F. H., Ong, C. T., Orlov, Y. L., Hong, S., Zhang, Z., Landt, S., Raha, D., Euskirchen, G., Wei, C., Ge, W., Wang, H., Davis, C., Fisher-Aylor, K. I., Mortazavi, A., Gerstein, M., Gingeras, T., Wold, B., Sun, Y., Fullwood, M. J., Cheung, E., Liu, E., Sung, W., Snyder, M., Ruan, Y. 2012; 148 (1-2): 84-98

    Abstract

    Higher-order chromosomal organization for transcription regulation is poorly understood in eukaryotes. Using genome-wide Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET), we mapped long-range chromatin interactions associated with RNA polymerase II in human cells and uncovered widespread promoter-centered intragenic, extragenic, and intergenic interactions. These interactions further aggregated into higher-order clusters, wherein proximal and distal genes were engaged through promoter-promoter interactions. Most genes with promoter-promoter interactions were active and transcribed cooperatively, and some interacting promoters could influence each other implying combinatorial complexity of transcriptional controls. Comparative analyses of different cell lines showed that cell-specific chromatin interactions could provide structural frameworks for cell-specific transcription, and suggested significant enrichment of enhancer-promoter interactions for cell-specific functions. Furthermore, genetically-identified disease-associated noncoding elements were found to be spatially engaged with corresponding genes through long-range interactions. Overall, our study provides insights into transcription regulation by three-dimensional chromatin interactions for both housekeeping and cell-specific genes in human cells.

    View details for DOI 10.1016/j.cell.2011.12.014

    View details for Web of Science ID 000299540700016

    View details for PubMedID 22265404

    View details for PubMedCentralID PMC3339270

  • Performance comparison of whole-genome sequencing platforms NATURE BIOTECHNOLOGY Lam, H. Y., Clark, M. J., Chen, R., Chen, R., Natsoulis, G., O'Huallachain, M., Dewey, F. E., Habegger, L., Ashley, E. A., Gerstein, M. B., Butte, A. J., Ji, H. P., Snyder, M. 2012; 30 (1): 78-U118

    Abstract

    Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ∼76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ∼3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms.

    View details for DOI 10.1038/nbt.2065

    View details for Web of Science ID 000299110600023

  • Dissecting phosphorylation networks: lessons learned from yeast EXPERT REVIEW OF PROTEOMICS Mok, J., Zhu, X., Snyder, M. 2011; 8 (6): 775-786

    Abstract

    Protein phosphorylation continues to be regarded as one of the most important post-translational modifications found in eukaryotes and has been implicated in key roles in the development of a number of human diseases. In order to elucidate roles for the 518 human kinases, phosphorylation has routinely been studied using the budding yeast Saccharomyces cerevisiae as a model system. In recent years, a number of technologies have emerged to globally map phosphorylation in yeast. In this article, we review these technologies and discuss how these phosphorylation mapping efforts have shed light on our understanding of kinase signaling pathways and eukaryotic proteomic networks in general.

    View details for DOI 10.1586/EPR.11.64

    View details for Web of Science ID 000297299000013

    View details for PubMedID 22087660

    View details for PubMedCentralID PMC3262144

  • Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF NATURE Iyer, V. R., Horak, C. E., Scafe, C. S., Botstein, D., Snyder, M., Brown, P. O. 2001; 409 (6819): 533-538

    Abstract

    Proteins interact with genomic DNA to bring the genome to life; and these interactions also define many functional features of the genome. SBF and MBF are sequence-specific transcription factors that activate gene expression during the G1/S transition of the cell cycle in yeast. SBF is a heterodimer of Swi4 and Swi6, and MBF is a heterodimer of Mbpl and Swi6 (refs 1, 3). The related Swi4 and Mbp1 proteins are the DNA-binding components of the respective factors, and Swi6 mayhave a regulatory function. A small number of SBF and MBF target genes have been identified. Here we define the genomic binding sites of the SBF and MBF transcription factors in vivo, by using DNA microarrays. In addition to the previously characterized targets, we have identified about 200 new putative targets. Our results support the hypothesis that SBF activated genes are predominantly involved in budding, and in membrane and cell-wall biosynthesis, whereas DNA replication and repair are the dominant functions among MBF activated genes. The functional specialization of these factors may provide a mechanism for independent regulation of distinct molecular processes that normally occur in synchrony during the mitotic cell cycle.

    View details for Web of Science ID 000166570500053

    View details for PubMedID 11206552

  • Author Correction: Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP). Nature cell biology Jain, S., Pei, L., Spraggins, J. M., Angelo, M., Carson, J. P., Gehlenborg, N., Ginty, F., Goncalves, J. P., Hagood, J. S., Hickey, J. W., Kelleher, N. L., Laurent, L. C., Lin, S., Lin, Y., Liu, H., Naba, A., Nakayasu, E. S., Qian, W., Radtke, A., Robson, P., Stockwell, B. R., Van de Plas, R., Vlachos, I. S., Zhou, M., HuBMAP Consortium, Borner, K., Snyder, M. P., Ahn, K. J., Allen, J., Anderson, D. M., Anderton, C. R., Curcio, C., Angelin, A., Arvanitis, C., Atta, L., Awosika-Olumo, D., Bahmani, A., Bai, H., Balderrama, K., Balzano, L., Bandyopadhyay, G., Bandyopadhyay, S., Bar-Joseph, Z., Barnhart, K., Barwinska, D., Becich, M., Becker, L., Becker, W., Bedi, K., Bendall, S., Benninger, K., Betancur, D., Bettinger, K., Billings, S., Blood, P., Bolin, D., Border, S., Bosse, M., Bramer, L., Brewer, M., Brusko, M., Bueckle, A., Burke, K., Burnum-Johnson, K., Butcher, E., Butterworth, E., Cai, L., Calandrelli, R., Caldwell, M., Campbell-Thompson, M., Cao, D., Cao-Berg, I., Caprioli, R., Caraccio, C., Caron, A., Carroll, M., Chadwick, C., Chen, A., Chen, D., Chen, F., Chen, H., Chen, J., Chen, L., Chen, L., Chiacchia, K., Cho, S., Chou, P., Choy, L., Cisar, C., Clair, G., Clarke, L., Clouthier, K. A., Colley, M. E., Conlon, K., Conroy, J., Contrepois, K., Corbett, A., Corwin, A., Cotter, D., Courtois, E., Cruz, A., Csonka, C., Czupil, K., Daiya, V., Dale, K., Davanagere, S. A., Dayao, M., de Caestecker, M. P., Decker, A., Deems, S., Degnan, D., Desai, T., Deshpande, V., Deutsch, G., Devlin, M., Diep, D., Dodd, C., Donahue, S., Dong, W., Dos Santos Peixoto, R., Duffy, M., Dufresne, M., Duong, T. E., Dutra, J., Eadon, M. T., El-Achkar, T. M., Enninful, A., Eraslan, G., Eshelman, D., Espin-Perez, A., Esplin, E. D., Esselman, A., Falo, L. D., Falo, L., Fan, J., Fan, R., Farrow, M. A., Farzad, N., Favaro, P., Fermin, J., Filiz, F., Filus, S., Fisch, K., Fisher, E., Fisher, S., Flowers, K., Flynn, W. F., Fogo, A. B., Fu, D. A., Fulcher, J., Fung, A., Furst, D., Gallant, M., Gao, F., Gao, Y., Gaulton, K., Gaut, J. P., Gee, J., Ghag, R. R., Ghazanfar, S., Ghose, S., Gisch, D., Gold, I., Gondalia, A., Gorman, B., Greenleaf, W., Greenwald, N., Gregory, B., Guo, R., Gupta, R., Hakimian, H., Haltom, J., Halushka, M., Han, K. S., Hanson, C., Harbury, P., Hardi, J., Harlan, L., Harris, R. C., Hartman, A., Heidari, E., Helfer, J., Helminiak, D., Hemberg, M., Henning, N., Herr, B. W., Ho, J., Holden-Wiltse, J., Hong, S., Hong, Y., Honick, B., Hood, G., Hu, P., Hu, Q., Huang, M., Huyck, H., Imtiaz, T., Isberg, O. G., Itkin, M., Jackson, D., Jacobs, M., Jain, Y., Jewell, D., Jiang, L., Jiang, Z. G., Johnston, S., Joshi, P., Ju, Y., Judd, A., Kagel, A., Kahn, A., Kalavros, N., Kalhor, K., Karagkouni, D., Karathanos, T., Karunamurthy, A., Katari, S., Kates, H., Kaushal, M., Keener, N., Keller, M., Kenney, M., Kern, C., Kharchenko, P., Kim, J., Kingsford, C., Kirwan, J., Kiselev, V., Kishi, J., Kitata, R. B., Knoten, A., Kollar, C., Krishnamoorthy, P., Kruse, A. R., Da, K., Kundaje, A., Kutschera, E., Kwon, Y., Lake, B. B., Lancaster, S., Langlieb, J., Lardenoije, R., Laronda, M., Laskin, J., Lau, K., Lee, H., Lee, M., Lee, M., Strekalova, Y. L., Li, D., Li, J., Li, J., Li, X., Li, Z., Liao, Y., Liaw, T., Lin, P., Lin, Y., Lindsay, S., Liu, C., Liu, Y., Liu, Y., Lott, M., Lotz, M., Lowery, L., Lu, P., Lu, X., Lucarelli, N., Lun, X., Luo, Z., Ma, J., Macosko, E., Mahajan, M., Maier, L., Makowski, D., Malek, M., Manthey, D., Manz, T., Margulies, K., Marioni, J., Martindale, M., Mason, C., Mathews, C., Maye, P., McCallum, C., McDonough, E., McDonough, L., Mcdowell, H., Meads, M., Medina-Serpas, M., Ferreira, R. M., Messinger, J., Metis, K., Migas, L. G., Miller, B., Mimar, S., Minor, B., Misra, R., Missarova, A., Mistretta, C., Moens, R., Moerth, E., Moffitt, J., Molla, G., Monroe, M., Monte, E., Morgan, M., Muraro, D., Murphy, B. R., Murray, E., Musen, M. A., Naglah, A., Nasamran, C., Neelakantan, T., Nevins, S., Nguyen, H., Nguyen, N., Nguyen, T., Nguyen, T., Nigra, D., Nofal, M., Nolan, G., Nwanne, G., O'Connor, M., Okuda, K., Olmer, M., O'Neill, K., Otaluka, N., Pang, M., Parast, M., Pasa-Tolic, L., Paten, B., Patterson, N. H., Peng, T., Phillips, G., Pichavant, M., Piehowski, P., Pilner, H., Pingry, E., Pita-Juarez, Y., Plevritis, S., Ploumakis, A., Pouch, A., Pryhuber, G., Puerto, J., Qaurooni, D., Qin, L., Quardokus, E. M., Rajbhandari, P., Rakow-Penner, R., Ramasamy, R., Read, D., Record, E. G., Reeves, D., Ricarte, A., Rodriguez-Soto, A., Ropelewski, A., Rosario, J., Roselkis, M., Rowe, D., Roy, T. K., Ruffalo, M., Ruschman, N., Sabo, A., Sachdev, N., Saka, S., Salamon, D., Sarder, P., Sasaki, H., Satija, R., Saunders, D., Sawka, R., Schey, K., Schlehlein, H., Scholten, D., Schultz, S., Schwartz, L., Schwenk, M., Scibek, R., Segre, A., Serrata, M., Shands, W., Shen, X., Shendure, J., Shephard, H., Shi, L., Shi, T., Shin, D., Shirey, B., Sibilla, M., Silber, M., Silverstein, J., Simmel, D., Simmons, A., Singhal, D., Sivajothi, S., Smits, T., Soncin, F., Song, Q., Stanley, V., Stuart, T., Su, H., Su, P., Sun, X., Surrette, C., Swahn, H., Tan, K., Teichmann, S., Tejomay, A., Tellides, G., Thomas, K., Thomas, T., Thompson, M., Tian, H., Tideman, L., Trapnell, C., Tsai, A. G., Tsai, C., Tsai, L., Tsui, E., Tsui, T., Tung, J., Turner, M., Uranic, J., Vaishnav, E. D., Varra, S. R., Vaskivskyi, V., Velickovic, D., Velickovic, M., Verheyden, J., Waldrip, J., Wallace, D., Wan, X., Wang, A., Wang, F., Wang, M., Wang, S., Wang, X., Wasserfall, C., Wayne, L., Webber, J., Weber, G. M., Wei, B., Wei, J., Weimer, A., Welling, J., Wen, X., Wen, Z., Williams, M., Winfree, S., Winograd, N., Woodard, A., Wright, D., Wu, F., Wu, P., Wu, Q., Wu, X., Xing, Y., Xu, T., Yang, M., Yang, M., Yap, J., Ye, D. H., Yin, P., Yuan, Z., Yun, C. J., Zahraei, A., Zemaitis, K., Zhang, B., Zhang, C., Zhang, C., Zhang, C., Zhang, K., Zhang, S., Zhang, T., Zhang, Y., Zhao, B., Zhao, W., Zheng, J. W., Zhong, S., Zhu, B., Zhu, C., Zhu, D., Zhu, Q., Zhu, Y. 2024

    View details for DOI 10.1038/s41556-024-01384-0

    View details for PubMedID 38429479

  • Corrigendum: Advances and potential of omics studies for understanding the development of food allergy. Frontiers in allergy Sindher, S. B., Chin, A. R., Aghaeepour, N., Prince, L., Maecker, H., Shaw, G. M., Stevenson, D., Nadeau, K. C., Snyder, M., Khatri, P., Boyd, S. D., Winn, V. D., Angst, M. S., Chinthrajah, R. S. 2024; 5: 1373485

    Abstract

    [This corrects the article DOI: 10.3389/falgy.2023.1149008.].

    View details for DOI 10.3389/falgy.2024.1373485

    View details for PubMedID 38464397

    View details for PubMedCentralID PMC10921899

  • Rare and common genetic determinants of mitochondrial function determine severity but not risk of amyotrophic lateral sclerosis. Heliyon Harvey, C., Weinreich, M., Lee, J. A., Shaw, A. C., Ferraiuolo, L., Mortiboys, H., Zhang, S., Hop, P. J., Zwamborn, R. A., van Eijk, K., Julian, T. H., Moll, T., Iacoangeli, A., Al Khleifat, A., Quinn, J. P., Pfaff, A. L., Kõks, S., Poulton, J., Battle, S. L., Arking, D. E., Snyder, M. P., Veldink, J. H., Kenna, K. P., Shaw, P. J., Cooper-Knock, J. 2024; 10 (3): e24975

    Abstract

    Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease involving selective vulnerability of energy-intensive motor neurons (MNs). It has been unclear whether mitochondrial function is an upstream driver or a downstream modifier of neurotoxicity. We separated upstream genetic determinants of mitochondrial function, including genetic variation within the mitochondrial genome or autosomes; from downstream changeable factors including mitochondrial DNA copy number (mtCN). Across three cohorts including 6,437 ALS patients, we discovered that a set of mitochondrial haplotypes, chosen because they are linked to measurements of mitochondrial function, are a determinant of ALS survival following disease onset, but do not modify ALS risk. One particular haplotype appeared to be neuroprotective and was significantly over-represented in two cohorts of long-surviving ALS patients. Causal inference for mitochondrial function was achievable using mitochondrial haplotypes, but not autosomal SNPs in traditional Mendelian randomization (MR). Furthermore, rare loss-of-function genetic variants within, and reduced MN expression of, ACADM and DNA2 lead to ∼50 % shorter ALS survival; both proteins are implicated in mitochondrial function. Both mtCN and cellular vulnerability are linked to DNA2 function in ALS patient-derived neurons. Finally, MtCN responds dynamically to the onset of ALS independently of mitochondrial haplotype, and is correlated with disease severity. We conclude that, based on the genetic measures we have employed, mitochondrial function is a therapeutic target for amelioration of disease severity but not prevention of ALS.

    View details for DOI 10.1016/j.heliyon.2024.e24975

    View details for PubMedID 38317984

    View details for PubMedCentralID PMC10839612

  • Harnessing human genetics and stem cells for precision cardiovascular medicine. Cell genomics Caudal, A., Snyder, M. P., Wu, J. C. 2024; 4 (2): 100445

    Abstract

    Human induced pluripotent stem cell (iPSC) platforms are valuable for biomedical and pharmaceutical research by providing tissue-specific human cells that retain patients' genetic integrity and display disease phenotypes in a dish. Looking forward, combining iPSC phenotyping platforms with genomic and screening technologies will continue to pave new directions for precision medicine, including genetic prediction, visualization, and treatment of heart disease. This review summarizes the recent use of iPSC technology to unpack the influence of genetic variants in cardiovascular pathology. We focus on various state-of-the-art genomic tools for cardiovascular therapies-including the expansion of genetic toolkits for molecular interrogation, in vitro population studies, and function-based drug screening-and their current applications in patient- and genome-edited iPSC platforms that are heralding new avenues for cardiovascular research.

    View details for DOI 10.1016/j.xgen.2023.100445

    View details for PubMedID 38359791

  • Validation of biomarkers of aging. Nature medicine Moqri, M., Herzog, C., Poganik, J. R., Ying, K., Justice, J. N., Belsky, D. W., Higgins-Chen, A. T., Chen, B. H., Cohen, A. A., Fuellen, G., Hägg, S., Marioni, R. E., Widschwendter, M., Fortney, K., Fedichev, P. O., Zhavoronkov, A., Barzilai, N., Lasky-Su, J., Kiel, D. P., Kennedy, B. K., Cummings, S., Slagboom, P. E., Verdin, E., Maier, A. B., Sebastiano, V., Snyder, M. P., Gladyshev, V. N., Horvath, S., Ferrucci, L. 2024

    Abstract

    The search for biomarkers that quantify biological aging (particularly 'omic'-based biomarkers) has intensified in recent years. Such biomarkers could predict aging-related outcomes and could serve as surrogate endpoints for the evaluation of interventions promoting healthy aging and longevity. However, no consensus exists on how biomarkers of aging should be validated before their translation to the clinic. Here, we review current efforts to evaluate the predictive validity of omic biomarkers of aging in population studies, discuss challenges in comparability and generalizability and provide recommendations to facilitate future validation of biomarkers of aging. Finally, we discuss how systematic validation can accelerate clinical translation of biomarkers of aging and their use in gerotherapeutic clinical trials.

    View details for DOI 10.1038/s41591-023-02784-9

    View details for PubMedID 38355974

    View details for PubMedCentralID 9792204

  • Miscarriage risk assessment: a bioinformatic approach to identifying candidate lethal genes and variants. Human genetics Aminbeidokhti, M., Qu, J., Belur, S., Cakmak, H., Jaswa, E., Lathi, R. B., Sirota, M., Snyder, M. P., Yatsenko, S. A., Rajkovic, A. 2024

    Abstract

    PURPOSE: Miscarriage, often resulting from a variety of genetic factors, is a common pregnancy outcome. Preconception genetic carrier screening (PGCS) identifies at-risk partners for newborn genetic disorders; however, PGCS panels currently lack miscarriage-related genes. In this study, we evaluated the potential impact of both known and candidate genes on prenatal lethality and the effectiveness of PGCS in diverse populations.METHODS: We analyzed 125,748 human exome sequences and mouse and human gene function databases. Our goals were to identify genes crucial for human fetal survival (lethal genes), to find variants not present in a homozygous state in healthy humans, and to estimate carrier rates of known and candidate lethal genes in various populations and ethnic groups.RESULTS: This study identified 138 genes in which heterozygous lethal variants are present in the general population with a frequency of 0.5% or greater. Screening for these 138 genes could identify 4.6% (in the Finnish population) to 39.8% (in the East Asian population) of couples at risk of miscarriage. This explains the cause of pregnancy loss in approximately 1.1-10% of cases affected by biallelic lethal variants.CONCLUSION: This study has identified a set of genes and variants potentially associated with lethality across different ethnic backgrounds. The variation of these genes across ethnic groups underscores the need for a comprehensive, pan-ethnic PGCS panel that includes genes related to miscarriage.

    View details for DOI 10.1007/s00439-023-02637-y

    View details for PubMedID 38302665

  • Untargeted metabolomic profiling in children identifies novel pathways in asthma and atopy. The Journal of allergy and clinical immunology Lejeune, S., Kaushik, A., Parsons, E. S., Chinthrajah, S., Snyder, M., Desai, M., Manohar, M., Prunicki, M., Contrepois, K., Gosset, P., Deschildre, A., Nadeau, K. 2024; 153 (2): 418-434

    Abstract

    Asthma and other atopic disorders can present with varying clinical phenotypes marked by differential metabolomic manifestations and enriched biological pathways.We sought to identify these unique metabolomic profiles in atopy and asthma.We analyzed baseline nonfasted plasma samples from a large multisite pediatric population of 470 children aged <13 years from 3 different sites in the United States and France. Atopy positivity (At+) was defined as skin prick test result of ≥3 mm and/or specific IgE ≥ 0.35 IU/mL and/or total IgE ≥ 173 IU/mL. Asthma positivity (As+) was based on physician diagnosis. The cohort was divided into 4 groups of varying combinations of asthma and atopy, and 6 pairwise analyses were conducted to best assess the differential metabolomic profiles between groups.Two hundred ten children were classified as At-As-, 42 as At+As-, 74 as At-As+, and 144 as At+As+. Untargeted global metabolomic profiles were generated through ultra-high-performance liquid chromatography-tandem mass spectroscopy. We applied 2 independent machine learning classifiers and short-listed 362 metabolites as discriminant features. Our analysis showed the most diverse metabolomic profile in the At+As+/At-As- comparison, followed by the At-As+/At-As- comparison, indicating that asthma is the most discriminant condition associated with metabolomic changes. At+As+ metabolomic profiles were characterized by higher levels of bile acids, sphingolipids, and phospholipids, and lower levels of polyamine, tryptophan, and gamma-glutamyl amino acids.The At+As+ phenotype displays a distinct metabolomic profile suggesting underlying mechanisms such as modulation of host-pathogen and gut microbiota interactions, epigenetic changes in T-cell differentiation, and lower antioxidant properties of the airway epithelium.

    View details for DOI 10.1016/j.jaci.2023.09.040

    View details for PubMedID 38344970

  • Correction: Digital health application integrating wearable data and behavioral patterns improves metabolic health. NPJ digital medicine Zahedani, A. D., McLaughlin, T., Veluvali, A., Aghaeepour, N., Hosseinian, A., Agarwal, S., Ruan, J., Tripathi, S., Woodward, M., Hashemi, N., Snyder, M. 2024; 7 (1): 9

    View details for DOI 10.1038/s41746-024-00996-y

    View details for PubMedID 38216626

  • Semi-supervised Cooperative Learning for Multiomics Data Fusion Ding, D., Shen, X., Snyder, M., Tibshirani, R., Maier, A. K., Schnabel, J. A., Tiwari, P., Stegle, O. SPRINGER INTERNATIONAL PUBLISHING AG. 2024: 54-63
  • Multi-omics in stress and health research: study designs that will drive the field forward. Stress (Amsterdam, Netherlands) Mengelkoch, S., Gassen, J., Lev-Ari, S., Alley, J. C., Schüssler-Fiorenza Rose, S. M., Snyder, M. P., Slavich, G. M. 2024; 27 (1): 2321610

    Abstract

    Despite decades of stress research, there still exist substantial gaps in our understanding of how social, environmental, and biological factors interact and combine with developmental stressor exposures, cognitive appraisals of stressors, and psychosocial coping processes to shape individuals' stress reactivity, health, and disease risk. Relatively new biological profiling approaches, called multi-omics, are helping address these issues by enabling researchers to quantify thousands of molecules from a single blood or tissue sample, thus providing a panoramic snapshot of the molecular processes occurring in an organism from a systems perspective. In this review, we summarize two types of research designs for which multi-omics approaches are best suited, and describe how these approaches can help advance our understanding of stress processes and the development, prevention, and treatment of stress-related pathologies. We first discuss incorporating multi-omics approaches into theory-rich, intensive longitudinal study designs to characterize, in high-resolution, the transition to stress-related multisystem dysfunction and disease throughout development. Next, we discuss how multi-omics approaches should be incorporated into intervention research to better understand the transition from stress-related dysfunction back to health, which can help inform novel precision medicine approaches to managing stress and fostering biopsychosocial resilience. Throughout, we provide concrete recommendations for types of studies that will help advance stress research, and translate multi-omics data into better health and health care.

    View details for DOI 10.1080/10253890.2024.2321610

    View details for PubMedID 38425100

  • Using Ecological Momentary Assessments to Study How Daily Fluctuations in Psychological States Impact Stress, Well-Being, and Health. Journal of clinical medicine Mengelkoch, S., Moriarity, D. P., Novak, A. M., Snyder, M. P., Slavich, G. M., Lev-Ari, S. 2023; 13 (1)

    Abstract

    Despite great interest in how dynamic fluctuations in psychological states such as mood, social safety, energy, present-focused attention, and burnout impact stress, well-being, and health, most studies examining these constructs use retrospective assessments with relatively long time-lags. Here, we discuss how ecological momentary assessments (EMAs) address methodological issues associated with retrospective reports to help reveal dynamic associations between psychological states at small timescales that are often missed in stress and health research. In addition to helping researchers characterize daily and within-day fluctuations and temporal dynamics between different health-relevant processes, EMAs can elucidate mechanisms through which interventions reduce stress and enhance well-being. EMAs can also be used to identify changes that precede critical health events, which can in turn be used to deliver ecological momentary interventions, or just-in-time interventions, to help prevent such events from occurring. To enable this work, we provide examples of scales and single-item questions used in EMA studies, recommend study designs and statistical approaches that capitalize on EMA data, and discuss limitations of EMA methods. In doing so, we aim to demonstrate how, when used carefully, EMA methods are well poised to greatly advance our understanding of how intrapersonal dynamics affect stress levels, well-being, and human health.

    View details for DOI 10.3390/jcm13010024

    View details for PubMedID 38202031

  • NGLY1 mutations cause protein aggregation in human neurons. Cell reports Manole, A., Wong, T., Rhee, A., Novak, S., Chin, S. M., Tsimring, K., Paucar, A., Williams, A., Newmeyer, T. F., Schafer, S. T., Rosh, I., Kaushik, S., Hoffman, R., Chen, S., Wang, G., Snyder, M., Cuervo, A. M., Andrade, L., Manor, U., Lee, K., Jones, J. R., Stern, S., Marchetto, M. C., Gage, F. H. 2023; 42 (12): 113466

    Abstract

    Biallelic mutations in the gene that encodes the enzyme N-glycanase 1 (NGLY1) cause a rare disease with multi-symptomatic features including developmental delay, intellectual disability, neuropathy, and seizures. NGLY1's activity in human neural cells is currently not well understood. To understand how NGLY1 gene loss leads to the specific phenotypes of NGLY1 deficiency, we employed direct conversion of NGLY1 patient-derived induced pluripotent stem cells (iPSCs) to functional cortical neurons. Transcriptomic, proteomic, and functional studies of iPSC-derived neurons lacking NGLY1 function revealed several major cellular processes that were altered, including protein aggregate-clearing functionality, mitochondrial homeostasis, and synaptic dysfunctions. These phenotypes were rescued by introduction of a functional NGLY1 gene and were observed in iPSC-derived mature neurons but not astrocytes. Finally, laser capture microscopy followed by mass spectrometry provided detailed characterization of the composition of protein aggregates specific to NGLY1-deficient neurons. Future studies will harness this knowledge for therapeutic development.

    View details for DOI 10.1016/j.celrep.2023.113466

    View details for PubMedID 38039131

  • Reduced FOXF1 links unrepaired DNA damage to pulmonary arterial hypertension. Nature communications Isobe, S., Nair, R. V., Kang, H. Y., Wang, L., Moonen, J. R., Shinohara, T., Cao, A., Taylor, S., Otsuki, S., Marciano, D. P., Harper, R. L., Adil, M. S., Zhang, C., Lago-Docampo, M., Körbelin, J., Engreitz, J. M., Snyder, M. P., Rabinovitch, M. 2023; 14 (1): 7578

    Abstract

    Pulmonary arterial hypertension (PAH) is a progressive disease in which pulmonary arterial (PA) endothelial cell (EC) dysfunction is associated with unrepaired DNA damage. BMPR2 is the most common genetic cause of PAH. We report that human PAEC with reduced BMPR2 have persistent DNA damage in room air after hypoxia (reoxygenation), as do mice with EC-specific deletion of Bmpr2 (EC-Bmpr2-/-) and persistent pulmonary hypertension. Similar findings are observed in PAEC with loss of the DNA damage sensor ATM, and in mice with Atm deleted in EC (EC-Atm-/-). Gene expression analysis of EC-Atm-/- and EC-Bmpr2-/- lung EC reveals reduced Foxf1, a transcription factor with selectivity for lung EC. Reducing FOXF1 in control PAEC induces DNA damage and impaired angiogenesis whereas transfection of FOXF1 in PAH PAEC repairs DNA damage and restores angiogenesis. Lung EC targeted delivery of Foxf1 to reoxygenated EC-Bmpr2-/- mice repairs DNA damage, induces angiogenesis and reverses pulmonary hypertension.

    View details for DOI 10.1038/s41467-023-43039-y

    View details for PubMedID 37989727

    View details for PubMedCentralID 4737700

  • Integrative multi-omic profiling of adult mouse brain endothelial cells and potential implications in Alzheimer's disease. Cell reports Yu, M., Nie, Y., Yang, J., Yang, S., Li, R., Rao, V., Hu, X., Fang, C., Li, S., Song, D., Guo, F., Snyder, M. P., Chang, H. Y., Kuo, C. J., Xu, J., Chang, J. 2023; 42 (11): 113392

    Abstract

    The blood-brain barrier (BBB) is primarily manifested by a variety of physiological properties of brain endothelial cells (ECs), but the molecular foundation for these properties remains incompletely clear. Here, we generate a comprehensive molecular atlas of adult brain ECs using acutely purified mouse ECs and integrated multi-omics. Using RNA sequencing (RNA-seq) and proteomics, we identify the transcripts and proteins selectively enriched in brain ECs and demonstrate that they are partially correlated. Using single-cell RNA-seq, we dissect the molecular basis of functional heterogeneity of brain ECs. Using integrative epigenomics and transcriptomics, we determine that TCF/LEF, SOX, and ETS families are top-ranked transcription factors regulating the BBB. We then validate the identified brain-EC-enriched proteins and transcription factors in normal mouse and human brain tissue and assess their expression changes in mice with Alzheimer's disease. Overall, we present a valuable resource with broad implications for regulation of the BBB and treatment of neurological disorders.

    View details for DOI 10.1016/j.celrep.2023.113392

    View details for PubMedID 37925638

  • Mental Health for All: The Case for Investing in Digital Mental Health to Improve Global Outcomes, Access, and Innovation in Low-Resource Settings. Journal of clinical medicine Faria, M., Zin, S. T., Chestnov, R., Novak, A. M., Lev-Ari, S., Snyder, M. 2023; 12 (21)

    Abstract

    Mental health disorders are an increasing global public health concern that contribute to morbidity, mortality, disability, and healthcare costs across the world. Biomedical and psychological research has come a long way in identifying the importance of mental health and its impact on behavioral risk factors, physiological health, and overall quality of life. Despite this, access to psychological and psychiatric services remains widely unavailable and is a challenge for many healthcare systems, particularly those in developing countries. This review article highlights the strengths and opportunities brought forward by digital mental health in narrowing this divide. Further, it points to the economic and societal benefits of effectively managing mental illness, making a case for investing resources into mental healthcare as a larger priority for large non-governmental organizations and individual nations across the globe.

    View details for DOI 10.3390/jcm12216735

    View details for PubMedID 37959201

  • Relationship of Heterologous Virus Responses and Outcomes in Hospitalized COVID-19 Patients. Journal of immunology (Baltimore, Md. : 1950) Rosenberg-Hasson, Y., Holmes, T. H., Diray-Arce, J., Chen, J., Kellogg, R., Snyder, M., Becker, P. M., Ozonoff, A., Rouphael, N., Reed, E. F., Maecker, H. T. 2023; 211 (8): 1224-1231

    Abstract

    The clinical trajectory of COVID-19 may be influenced by previous responses to heterologous viruses. We examined the relationship of Abs against different viruses to clinical trajectory groups from the National Institutes of Health IMPACC (Immunophenotyping Assessment in a COVID-19 Cohort) study of hospitalized COVID-19 patients. Whereas initial Ab titers to SARS-CoV-2 tended to be higher with increasing severity (excluding fatal disease), those to seasonal coronaviruses trended in the opposite direction. Initial Ab titers to influenza and parainfluenza viruses also tended to be lower with increasing severity. However, no significant relationship was observed for Abs to other viruses, including measles, CMV, EBV, and respiratory syncytial virus. We hypothesize that some individuals may produce lower or less durable Ab responses to respiratory viruses generally (reflected in lower baseline titers in our study), and that this may carry over into poorer outcomes for COVID-19 (despite high initial SARS-CoV-2 titers). We further looked at longitudinal changes in Ab responses to heterologous viruses, but found little change during the course of acute COVID-19 infection. We saw significant trends with age for Ab levels to many of these viruses, but no difference in longitudinal SARS-CoV-2 titers for those with high versus low seasonal coronavirus titers. We detected no difference in longitudinal SARS-CoV-2 titers for CMV seropositive versus seronegative patients, although there was an overrepresentation of CMV seropositives among the IMPACC cohort, compared with expected frequencies in the United States population. Our results both reinforce findings from other studies and suggest (to our knowledge) new relationships between the response to SARS-CoV-2 and Abs to heterologous viruses.

    View details for DOI 10.4049/jimmunol.2300391

    View details for PubMedID 37756530

    View details for PubMedCentralID PMC10539027

  • Integrative omic profiling and analyses in two pig heart to human xenotransplants Keating, B., Schmauch, E., Piening, B., Xia, B., Zhu, C., Chang, B., Khalil, K., Kim, J., Weldon, E., Pass, H., Ayares, D., Griesemer, A., Mangiola, M., Stern, J., Snyder, M. P., Boeke, J., Montgomery, R. A. LIPPINCOTT WILLIAMS & WILKINS. 2023: 137
  • Integration of spatial and single-cell data across modalities with weakly linked features. Nature biotechnology Chen, S., Zhu, B., Huang, S., Hickey, J. W., Lin, K. Z., Snyder, M., Greenleaf, W. J., Nolan, G. P., Zhang, N. R., Ma, Z. 2023

    Abstract

    Although single-cell and spatial sequencing methods enable simultaneous measurement of more than one biological modality, no technology can capture all modalities within the same cell. For current data integration methods, the feasibility of cross-modal integration relies on the existence of highly correlated, a priori 'linked' features. We describe matching X-modality via fuzzy smoothed embedding (MaxFuse), a cross-modal data integration method that, through iterative coembedding, data smoothing and cell matching, uses all information in each modality to obtain high-quality integration even when features are weakly linked. MaxFuse is modality-agnostic and demonstrates high robustness and accuracy in the weak linkage scenario, achieving 20~70% relative improvement over existing methods under key evaluation metrics on benchmarking datasets. A prototypical example of weak linkage is the integration of spatial proteomic data with single-cell sequencing data. On two example analyses of this type, MaxFuse enabled the spatial consolidation of proteomic, transcriptomic and epigenomic information at single-cell resolution on the same tissue section.

    View details for DOI 10.1038/s41587-023-01935-0

    View details for PubMedID 37679544

    View details for PubMedCentralID 5669064

  • Author Correction: Lipid droplets and peroxisomes are co-regulated to drive lifespan extension in response to mono-unsaturated fatty acids. Nature cell biology Papsdorf, K., Miklas, J. W., Hosseini, A., Cabruja, M., Morrow, C. S., Savini, M., Yu, Y., Silva-García, C. G., Haseley, N. R., Murphy, L. M., Yao, P., de Launoit, E., Dixon, S. J., Snyder, M. P., Wang, M. C., Mair, W. B., Brunet, A. 2023

    View details for DOI 10.1038/s41556-023-01220-x

    View details for PubMedID 37567997

  • Multi-omics approaches in psychoneuroimmunology and health research: Conceptual considerations and methodological recommendations. Brain, behavior, and immunity Mengelkoch, S., Lautman, Z., Alley, J. C., Roos, L. G., Ehlert, B., Moriarity, D. P., Lancaster, S., Miryam Schussler-Fiorenza Rose, S., Snyder, M. P., Slavich, G. M. 2023

    Abstract

    The field of psychoneuroimmunology (PNI) has grown substantially in both relevance and prominence over the past 40 years. Notwithstanding its impressive trajectory, a majority of PNI studies are still based on a relatively small number of analytes. To advance this work, we suggest that PNI, and health research in general, can benefit greatly from adopting a multi-omics approach, which involves integrating data across multiple biological levels (e.g., the genome, proteome, transcriptome, metabolome, lipidome, and microbiome/metagenome) to more comprehensively profile biological functions and relate these profiles to clinical and behavioral outcomes. To assist investigators in this endeavor, we provide an overview of multi-omics research, highlight recent landmark multi-omics studies investigating human health and disease risk, and discuss how multi-omics can be applied to better elucidate links between psychological, nervous system, and immune system activity. In doing so, we describe how to design high-quality multi-omics PNI studies, decide which biological samples (e.g., blood, stool, urine, saliva, solid tissue) are most relevant, incorporate behavioral and wearable sensing data into multi-omics research, and understand key data quality, integration, analysis, and interpretation issues. PNI researchers are addressing some of the most interesting and important questions at the intersection of psychology, neuroscience, and immunology. Applying a multi-omics approach to this work will greatly expand the horizon of what is possible in PNI and has the potential to revolutionize our understanding of mind-body medicine.

    View details for DOI 10.1016/j.bbi.2023.07.022

    View details for PubMedID 37543247

  • Organ Mapping Antibody Panels: a community resource for standardized multiplexed tissue imaging. Nature methods Quardokus, E. M., Saunders, D. C., McDonough, E., Hickey, J. W., Werlein, C., Surrette, C., Rajbhandari, P., Casals, A. M., Tian, H., Lowery, L., Neumann, E. K., Björklund, F., Neelakantan, T. V., Croteau, J., Wiblin, A. E., Fisher, J., Livengood, A. J., Dowell, K. G., Silverstein, J. C., Spraggins, J. M., Pryhuber, G. S., Deutsch, G., Ginty, F., Nolan, G. P., Melov, S., Jonigk, D., Caldwell, M. A., Vlachos, I. S., Muller, W., Gehlenborg, N., Stockwell, B. R., Lundberg, E., Snyder, M. P., Germain, R. N., Camarillo, J. M., Kelleher, N. L., Börner, K., Radtke, A. J. 2023

    Abstract

    Multiplexed antibody-based imaging enables the detailed characterization of molecular and cellular organization in tissues. Advances in the field now allow high-parameter data collection (>60 targets); however, considerable expertise and capital are needed to construct the antibody panels employed by these methods. Organ mapping antibody panels are community-validated resources that save time and money, increase reproducibility, accelerate discovery and support the construction of a Human Reference Atlas.

    View details for DOI 10.1038/s41592-023-01846-7

    View details for PubMedID 37468619

    View details for PubMedCentralID 10335836

  • Segmentation of human functional tissue units in support of a Human Reference Atlas. Communications biology Jain, Y., Godwin, L. L., Ju, Y., Sood, N., Quardokus, E. M., Bueckle, A., Longacre, T., Horning, A., Lin, Y., Esplin, E. D., Hickey, J. W., Snyder, M. P., Patterson, N. H., Spraggins, J. M., Börner, K. 2023; 6 (1): 717

    Abstract

    The Human BioMolecular Atlas Program (HuBMAP) aims to compile a Human Reference Atlas (HRA) for the healthy adult body at the cellular level. Functional tissue units (FTUs), relevant for HRA construction, are of pathobiological significance. Manual segmentation of FTUs does not scale; highly accurate and performant, open-source machine-learning algorithms are needed. We designed and hosted a Kaggle competition that focused on development of such algorithms and 1200 teams from 60 countries participated. We present the competition outcomes and an expanded analysis of the winning algorithms on additional kidney and colon tissue data, and conduct a pilot study to understand spatial location and density of FTUs across the kidney. The top algorithm from the competition, Tom, outperforms other algorithms in the expanded study, while using fewer computational resources. Tom was added to the HuBMAP infrastructure to run kidney FTU segmentation at scale-showcasing the value of Kaggle competitions for advancing research.

    View details for DOI 10.1038/s42003-023-04848-5

    View details for PubMedID 37468557

    View details for PubMedCentralID PMC10356924

  • Reverse-ChIP Techniques for Identifying Locus-Specific Proteomes: A Key Tool in Unlocking the Cancer Regulome. Cells MacKenzie, T. M., Cisneros, R., Maynard, R. D., Snyder, M. P. 2023; 12 (14)

    Abstract

    A phenotypic hallmark of cancer is aberrant transcriptional regulation. Transcriptional regulation is controlled by a complicated array of molecular factors, including the presence of transcription factors, the deposition of histone post-translational modifications, and long-range DNA interactions. Determining the molecular identity and function of these various factors is necessary to understand specific aspects of cancer biology and reveal potential therapeutic targets. Regulation of the genome by specific factors is typically studied using chromatin immunoprecipitation followed by sequencing (ChIP-Seq) that identifies genome-wide binding interactions through the use of factor-specific antibodies. A long-standing goal in many laboratories has been the development of a 'reverse-ChIP' approach to identify unknown binding partners at loci of interest. A variety of strategies have been employed to enable the selective biochemical purification of sequence-defined chromatin regions, including single-copy loci, and the subsequent analytical detection of associated proteins. This review covers mass spectrometry techniques that enable quantitative proteomics before providing a survey of approaches toward the development of strategies for the purification of sequence-specific chromatin as a 'reverse-ChIP' technique. A fully realized reverse-ChIP technique holds great potential for identifying cancer-specific targets and the development of personalized therapeutic regimens.

    View details for DOI 10.3390/cells12141860

    View details for PubMedID 37508524

  • Author Correction: Clonal haematopoiesis and risk of chronic liver disease. Nature Wong, W. J., Emdin, C., Bick, A. G., Zekavat, S. M., Niroula, A., Pirruccello, J. P., Dichtel, L., Griffin, G., Uddin, M. M., Gibson, C. J., Kovalcik, V., Lin, A. E., McConkey, M. E., Vromman, A., Sellar, R. S., Kim, P. G., Agrawal, M., Weinstock, J., Long, M. T., Yu, B., Banerjee, R., Nicholls, R. C., Dennis, A., Kelly, M., Loh, P., McCarroll, S., Boerwinkle, E., Vasan, R. S., Jaiswal, S., Johnson, A. D., Chung, R. T., Corey, K., Levy, D., Ballantyne, C., NHLBI TOPMed Hematology Working Group, Ebert, B. L., Natarajan, P., Abe, N., Abecasis, G., Aguet, F., Albert, C., Almasy, L., Alonso, A., Ament, S., Anderson, P., Anugu, P., Applebaum-Bowden, D., Ardlie, K., Arking, D., Arnett, D. K., Ashley-Koch, A., Aslibekyan, S., Assimes, T., Auer, P., Avramopoulos, D., Ayas, N., Balasubramanian, A., Barnard, J., Barnes, K., Barr, R. G., Barron-Casella, E., Barwick, L., Beaty, T., Beck, G., Becker, D., Becker, L., Beer, R., Beitelshees, A., Benjamin, E., Benos, T., Bezerra, M., Bielak, L., Bis, J., Blackwell, T., Blangero, J., Blue, N., Bowden, D. W., Bowler, R., Brody, J., Broeckel, U., Broome, J., Brown, D., Bunting, K., Burchard, E., Bustamante, C., Buth, E., Cade, B., Cardwell, J., Carey, V., Carrier, J., Carson, A. P., Carty, C., Casaburi, R., Casas Romero, J. P., Casella, J., Castaldi, P., Chaffin, M., Chang, C., Chang, Y., Chasman, D., Chavan, S., Chen, B., Chen, W., Chen, Y. I., Cho, M., Choi, S. H., Chuang, L., Chung, M., Chung, R., Clish, C., Comhair, S., Conomos, M., Cornell, E., Correa, A., Crandall, C., Crapo, J., Cupples, L. A., Curran, J., Curtis, J., Custer, B., Damcott, C., Darbar, D., David, S., Davis, C., Daya, M., de Andrade, M., de Las Fuentes, L., de Vries, P., DeBaun, M., Deka, R., DeMeo, D., Devine, S., Dinh, H., Doddapaneni, H., Duan, Q., Dugan-Perez, S., Duggirala, R., Durda, J. P., Dutcher, S. K., Eaton, C., Ekunwe, L., El Boueiz, A., Ellinor, P., Emery, L., Erzurum, S., Farber, C., Farek, J., Fingerlin, T., Flickinger, M., Fornage, M., Franceschini, N., Frazar, C., Fu, M., Fullerton, S. M., Fulton, L., Gabriel, S., Gan, W., Gao, S., Gao, Y., Gass, M., Geiger, H., Gelb, B., Geraci, M., Germer, S., Gerszten, R., Ghosh, A., Gibbs, R., Gignoux, C., Gladwin, M., Glahn, D., Gogarten, S., Gong, D., Goring, H., Graw, S., Gray, K. J., Grine, D., Gross, C., Gu, C. C., Guan, Y., Guo, X., Gupta, N., Haessler, J., Hall, M., Han, Y., Hanly, P., Harris, D., Hawley, N. L., He, J., Heavner, B., Heckbert, S., Hernandez, R., Herrington, D., Hersh, C., Hidalgo, B., Hixson, J., Hobbs, B., Hokanson, J., Hong, E., Hoth, K., Hsiung, C. A., Hu, J., Hung, Y., Huston, H., Hwu, C. M., Irvin, M. R., Jackson, R., Jain, D., Jaquish, C., Johnsen, J., Johnson, C., Johnston, R., Jones, K., Kang, H. M., Kaplan, R., Kardia, S., Kelly, S., Kenny, E., Kessler, M., Khan, A., Khan, Z., Kim, W., Kimoff, J., Kinney, G., Konkle, B., Kooperberg, C., Kramer, H., Lange, C., Lange, E., Lange, L., Laurie, C., Laurie, C., LeBoff, M., Lee, J., Lee, S., Lee, W., LeFaive, J., Levine, D., Lewis, J., Li, X., Li, Y., Lin, H., Lin, H., Lin, X., Liu, S., Liu, Y., Liu, Y., Loos, R. J., Lubitz, S., Lunetta, K., Luo, J., Magalang, U., Mahaney, M., Make, B., Manichaikul, A., Manning, A., Manson, J., Martin, L., Marton, M., Mathai, S., Mathias, R., May, S., McArdle, P., McDonald, M., McFarland, S., McGarvey, S., McGoldrick, D., McHugh, C., McNeil, B., Mei, H., Meigs, J., Menon, V., Mestroni, L., Metcalf, G., Meyers, D. A., Mignot, E., Mikulla, J., Min, N., Minear, M., Minster, R. L., Mitchell, B. D., Moll, M., Momin, Z., Montasser, M. E., Montgomery, C., Muzny, D., Mychaleckyj, J. C., Nadkarni, G., Naik, R., Naseri, T., Nekhai, S., Nelson, S. C., Neltner, B., Nessner, C., Nickerson, D., Nkechinyere, O., North, K., O'Connell, J., O'Connor, T., Ochs-Balcom, H., Okwuonu, G., Pack, A., Paik, D. T., Palmer, N., Pankow, J., Papanicolaou, G., Parker, C., Peloso, G., Peralta, J. M., Perez, M., Perry, J., Peters, U., Peyser, P., Phillips, L. S., Pleiness, J., Pollin, T., Post, W., Powers Becker, J., Preethi Boorgula, M., Preuss, M., Psaty, B., Qasba, P., Qiao, D., Qin, Z., Rafaels, N., Raffield, L., Rajendran, M., Rao, D. C., Rasmussen-Torvik, L., Ratan, A., Redline, S., Reed, R., Reeves, C., Regan, E., Reiner, A., Reupena, M. S., Rice, K., Rich, S., Robillard, R., Robine, N., Roden, D., Roselli, C., Rotter, J., Ruczinski, I., Runnels, A., Russell, P., Ruuska, S., Ryan, K., Sabino, E. C., Saleheen, D., Salimi, S., Salvi, S., Salzberg, S., Sandow, K., Sankaran, V. G., Santibanez, J., Schwander, K., Schwartz, D., Sciurba, F., Seidman, C., Seidman, J., Series, F., Sheehan, V., Sherman, S. L., Shetty, A., Shetty, A., Sheu, W. H., Shoemaker, M. B., Silver, B., Silverman, E., Skomro, R., Smith, A. V., Smith, J., Smith, J., Smith, N., Smith, T., Smoller, S., Snively, B., Snyder, M., Sofer, T., Sotoodehnia, N., Stilp, A. M., Storm, G., Streeten, E., Su, J. L., Sung, Y. J., Sylvia, J., Szpiro, A., Taliun, D., Tang, H., Taub, M., Taylor, K. D., Taylor, M., Taylor, S., Telen, M., Thornton, T. A., Threlkeld, M., Tinker, L., Tirschwell, D., Tishkoff, S., Tiwari, H., Tong, C., Tracy, R., Tsai, M., Vaidya, D., Van Den Berg, D., VandeHaar, P., Vrieze, S., Walker, T., Wallace, R., Walts, A., Wang, F. F., Wang, H., Wang, J., Watson, K., Watt, J., Weeks, D. E., Weir, B., Weiss, S. T., Weng, L., Wessel, J., Willer, C., Williams, K., Williams, L. K., Williams, S., Wilson, C., Wilson, J., Winterkorn, L., Wong, Q., Wu, J., Xu, H., Yanek, L., Yang, I., Yu, K., Zhang, Y., Zhao, S. X., Zhao, W., Zhu, X., Ziv, E., Zody, M., Zoellner, S. 2023

    View details for DOI 10.1038/s41586-023-06375-z

    View details for PubMedID 37400552

  • A Roadmap for the Human Gut Cell Atlas. Nature reviews. Gastroenterology & hepatology Zilbauer, M., James, K. R., Kaur, M., Pott, S., Li, Z., Burger, A., Thiagarajah, J. R., Burclaff, J., Jahnsen, F. L., Perrone, F., Ross, A. D., Matteoli, G., Stakenborg, N., Sujino, T., Moor, A., Bartolome-Casado, R., Bækkevold, E. S., Zhou, R., Xie, B., Lau, K. S., Din, S., Magness, S. T., Yao, Q., Beyaz, S., Arends, M., Denadai-Souza, A., Coburn, L. A., Gaublomme, J. T., Baldock, R., Papatheodorou, I., Ordovas-Montanes, J., Boeckxstaens, G., Hupalowska, A., Teichmann, S. A., Regev, A., Xavier, R. J., Simmons, A., Snyder, M. P., Wilson, K. T. 2023

    Abstract

    The number of studies investigating the human gastrointestinal tract using various single-cell profiling methods has increased substantially in the past few years. Although this increase provides a unique opportunity for the generation of the first comprehensive Human Gut Cell Atlas (HGCA), there remains a range of major challenges ahead. Above all, the ultimate success will largely depend on a structured and coordinated approach that aligns global efforts undertaken by a large number of research groups. In this Roadmap, we discuss a comprehensive forward-thinking direction for the generation of the HGCA on behalf of the Gut Biological Network of the Human Cell Atlas. Based on the consensus opinion of experts from across the globe, we outline the main requirements for the first complete HGCA by summarizing existing data sets and highlighting anatomical regions and/or tissues with limited coverage. We provide recommendations for future studies and discuss key methodologies and the importance of integrating the healthy gut atlas with related diseases and gut organoids. Importantly, we critically overview the computational tools available and provide recommendations to overcome key challenges.

    View details for DOI 10.1038/s41575-023-00784-1

    View details for PubMedID 37258747

    View details for PubMedCentralID 5541232

  • Multiomic signals associated with maternal epidemiological factors contributing to preterm birth in low- and middle-income countries. Science advances Espinosa, C. A., Khan, W., Khanam, R., Das, S., Khalid, J., Pervin, J., Kasaro, M. P., Contrepois, K., Chang, A. L., Phongpreecha, T., Michael, B., Ellenberger, M., Mehmood, U., Hotwani, A., Nizar, A., Kabir, F., Wong, R. J., Becker, M., Berson, E., Culos, A., De Francesco, D., Mataraso, S., Ravindra, N., Thuraiappah, M., Xenochristou, M., Stelzer, I. A., Marić, I., Dutta, A., Raqib, R., Ahmed, S., Rahman, S., Hasan, A. S., Ali, S. M., Juma, M. H., Rahman, M., Aktar, S., Deb, S., Price, J. T., Wise, P. H., Winn, V. D., Druzin, M. L., Gibbs, R. S., Darmstadt, G. L., Murray, J. C., Stringer, J. S., Gaudilliere, B., Snyder, M. P., Angst, M. S., Rahman, A., Baqui, A. H., Jehan, F., Nisar, M. I., Vwalika, B., Sazawal, S., Shaw, G. M., Stevenson, D. K., Aghaeepour, N. 2023; 9 (21): eade7692

    Abstract

    Preterm birth (PTB) is the leading cause of death in children under five, yet comprehensive studies are hindered by its multiple complex etiologies. Epidemiological associations between PTB and maternal characteristics have been previously described. This work used multiomic profiling and multivariate modeling to investigate the biological signatures of these characteristics. Maternal covariates were collected during pregnancy from 13,841 pregnant women across five sites. Plasma samples from 231 participants were analyzed to generate proteomic, metabolomic, and lipidomic datasets. Machine learning models showed robust performance for the prediction of PTB (AUROC = 0.70), time-to-delivery (r = 0.65), maternal age (r = 0.59), gravidity (r = 0.56), and BMI (r = 0.81). Time-to-delivery biological correlates included fetal-associated proteins (e.g., ALPP, AFP, and PGF) and immune proteins (e.g., PD-L1, CCL28, and LIFR). Maternal age negatively correlated with collagen COL9A1, gravidity with endothelial NOS and inflammatory chemokine CXCL13, and BMI with leptin and structural protein FABP4. These results provide an integrated view of epidemiological factors associated with PTB and identify biological signatures of clinical covariates affecting this disease.

    View details for DOI 10.1126/sciadv.ade7692

    View details for PubMedID 37224249

  • The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity. bioRxiv : the preprint server for biology Reese, F., Williams, B., Balderrama-Gutierrez, G., Wyman, D., Çelik, M. H., Rebboah, E., Rezaie, N., Trout, D., Razavi-Mohseni, M., Jiang, Y., Borsari, B., Morabito, S., Liang, H. Y., McGill, C. J., Rahmanian, S., Sakr, J., Jiang, S., Zeng, W., Carvalho, K., Weimer, A. K., Dionne, L. A., McShane, A., Bedi, K., Elhajjajy, S. I., Upchurch, S., Jou, J., Youngworth, I., Gabdank, I., Sud, P., Jolanki, O., Strattan, J. S., Kagda, M. S., Snyder, M. P., Hitz, B. C., Moore, J. E., Weng, Z., Bennett, D., Reinholdt, L., Ljungman, M., Beer, M. A., Gerstein, M. B., Pachter, L., Guigó, R., Wold, B. J., Mortazavi, A. 2023

    Abstract

    The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.

    View details for DOI 10.1101/2023.05.15.540865

    View details for PubMedID 37292896

    View details for PubMedCentralID PMC10245583

  • Organ-specific aging and the risk of chronic diseases. Nature medicine Moqri, M., Snyder, M. 2023

    View details for DOI 10.1038/s41591-023-02338-z

    View details for PubMedID 37161069

  • Gut Microbiome-Based Management ofPatients With HeartFailure: JACC Review Topic of the Week. Journal of the American College of Cardiology Mamic, P., Snyder, M., Tang, W. H. 2023; 81 (17): 1729-1739

    Abstract

    Despite therapeutic advances, chronic heart failure (HF) is still associated with significant risk of morbidity and mortality. The course of disease and responses to therapies vary widely among individuals with HF, highlighting the need for precision medicine approaches. Gut microbiome stands to be an important aspect of precision medicine in HF. Exploratory clinical studies have revealed shared patterns of gut microbiome dysregulation in this disease, with mechanistic animal studies providing evidence for active involvement of the gut microbiome in development and pathophysiology of HF. Deeper insights into gut microbiome-host interactions in patients with HF promise to deliver novel disease biomarkers, preventative and therapeutic targets, and improve disease risk stratification. This knowledge may enable a paradigm shift in how we care for patients with HF, and pave the path toward improved clinical outcomes through personalized HF care.

    View details for DOI 10.1016/j.jacc.2023.02.045

    View details for PubMedID 37100490

  • Association between the dynamics of the gut microbiota and responsiveness to mental health therapy Zhou, X., Ganz, A. B., Lu, X., Li, Y., Snyder, M. AMER ASSOC IMMUNOLOGISTS. 2023
  • Lipid droplets and peroxisomes are co-regulated to drive lifespan extension in response to mono-unsaturated fatty acids. Nature cell biology Papsdorf, K., Miklas, J. W., Hosseini, A., Cabruja, M., Morrow, C. S., Savini, M., Yu, Y., Silva-Garcia, C. G., Haseley, N. R., Murphy, L. M., Yao, P., de Launoit, E., Dixon, S. J., Snyder, M. P., Wang, M. C., Mair, W. B., Brunet, A. 2023

    Abstract

    Dietary mono-unsaturated fatty acids (MUFAs) are linked to longevity in several species. But the mechanisms by which MUFAs extend lifespan remain unclear. Here we show that an organelle network involving lipid droplets and peroxisomes is critical for MUFA-induced longevity in Caenorhabditis elegans. MUFAs upregulate the number of lipid droplets in fat storage tissues. Increased lipid droplet number is necessary for MUFA-induced longevity and predicts remaining lifespan. Lipidomics datasets reveal that MUFAs also modify the ratio of membrane lipids and ether lipids-a signature associated with decreased lipid oxidation. In agreement with this, MUFAs decrease lipid oxidation in middle-aged individuals. Intriguingly, MUFAs upregulate not only lipid droplet number but also peroxisome number. A targeted screen identifies genes involved in the co-regulation of lipid droplets and peroxisomes, and reveals that induction of both organelles is optimal for longevity. Our study uncovers an organelle network involved in lipid homeostasis and lifespan regulation, opening new avenues for interventions to delay aging.

    View details for DOI 10.1038/s41556-023-01136-6

    View details for PubMedID 37127715

  • Organism-wide, cell-type-specific secretome mapping of exercise training in mice. Cell metabolism Wei, W., Riley, N. M., Lyu, X., Shen, X., Guo, J., Raun, S. H., Zhao, M., Moya-Garzon, M. D., Basu, H., Sheng-Hwa Tung, A., Li, V. L., Huang, W., Wiggenhorn, A. L., Svensson, K. J., Snyder, M. P., Bertozzi, C. R., Long, J. Z. 2023

    Abstract

    There is a significant interest in identifying blood-borne factors that mediate tissue crosstalk and function as molecular effectors of physical activity. Although past studies have focused on an individual molecule or cell type, the organism-wide secretome response to physical activity has not been evaluated. Here, we use a cell-type-specific proteomic approach to generate a 21-cell-type, 10-tissue map of exercise training-regulated secretomes in mice. Our dataset identifies >200 exercise training-regulated cell-type-secreted protein pairs, the majority of which have not been previously reported. Pdgfra-cre-labeled secretomes were the most responsive to exercise training. Finally, we show anti-obesity, anti-diabetic, and exercise performance-enhancing activities for proteoforms of intracellular carboxylesterases whose secretion from the liver is induced by exercise training.

    View details for DOI 10.1016/j.cmet.2023.04.011

    View details for PubMedID 37141889

  • Leveraging Physiology and Artificial Intelligence to Deliver Advancements in Healthcare. Physiological reviews Zhang, A., Wu, Z., Wu, E., Wu, M., Snyder, M. P., Zou, J., Wu, J. C. 2023

    Abstract

    Artificial Intelligence (AI) in healthcare has generated remarkable innovation and progress in the last decade. Significant advancements can be attributed to the utilization of AI to transform physiology data to advance healthcare. In this review, we will explore how past work has shaped the field and defined future challenges and directions. In particular, we focus on three areas of development. First, we give an overview of AI, with special attention to the most relevant AI models. We then detail how physiology data has been harnessed by AI to advance the main areas of healthcare such as automating existing healthcare tasks, increasing access to care, and augmenting healthcare capabilities. Finally, we discuss emerging concerns surrounding the use of individual physiology data and detail an increasingly important consideration for the field, namely the challenges of deploying AI models to achieve meaningful clinical impact.

    View details for DOI 10.1152/physrev.00033.2022

    View details for PubMedID 37104717

  • Multi-omics profiling for health. Molecular & cellular proteomics : MCP Babu, M., Snyder, M. 2023: 100561

    Abstract

    The world has witnessed a steady rise in both non-infectious and infectious chronic diseases, prompting a cross-disciplinary approach to understand and treat disease. Current medical care focuses on treating people after they become patients rather than to preventing illness, leading to high costs in treating chronic and late-stage diseases. Additionally, a 'one-size-fits all' approach to healthcare does not take into account individual differences in genetics, environment, or lifestyle factors, decreasing the number of people benefiting from interventions. Rapid advances in omics technologies and progress in computational capabilities have led to the development of multi-omics deep phenotyping, which profiles the interaction of multiple levels of biology over time and empowers precision health approaches. This review highlights current and emerging multi-omics modalities for precision health and discusses applications in the following areas: genetic variation, cardio-metabolic diseases, cancer, infectious diseases, organ transplantation, pregnancy, and longevity/aging. We will briefly discuss the potential of multi-omics approaches in disentangling host-microbe and host-environmental interactions. We will touch on emerging areas of electronic health record and clinical imaging integration with muti-omics for precision health. Finally, we will briefly discuss the challenges in clinical implementation of multi-omics and their future prospects.

    View details for DOI 10.1016/j.mcpro.2023.100561

    View details for PubMedID 37119971

  • The ENCODE Imputation Challenge: a critical assessment of methods for cross-cell type imputation of epigenomic profiles. Genome biology Schreiber, J., Boix, C., Wook Lee, J., Li, H., Guan, Y., Chang, C. C., Chang, J. C., Hawkins-Hooker, A., Schölkopf, B., Schweikert, G., Carulla, M. R., Canakoglu, A., Guzzo, F., Nanni, L., Masseroli, M., Carman, M. J., Pinoli, P., Hong, C., Yip, K. Y., Spence, J. P., Batra, S. S., Song, Y. S., Mahony, S., Zhang, Z., Tan, W., Shen, Y., Sun, Y., Shi, M., Adrian, J., Sandstrom, R., Farrell, N., Halow, J., Lee, K., Jiang, L., Yang, X., Epstein, C., Strattan, J. S., Bernstein, B., Snyder, M., Kellis, M., Stafford, W., Kundaje, A. 2023; 24 (1): 79

    Abstract

    A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.

    View details for DOI 10.1186/s13059-023-02915-y

    View details for PubMedID 37072822

    View details for PubMedCentralID PMC10111747

  • Acetyl-Click Screening Platform Identifies Small-Molecule Inhibitors of Histone Acetyltransferase 1 (HAT1). Journal of medicinal chemistry Gaddameedi, J. D., Chou, T., Geller, B. S., Rangarajan, A., Swaminathan, T. A., Dixon, D., Long, K., Golder, C. J., Vuong, V. A., Banuelos, S., Greenhouse, R., Snyder, M. P., Lipchik, A. M., Gruber, J. J. 2023

    Abstract

    HAT1 is a central regulator of chromatin synthesis that acetylates nascent histone H4. To ascertain whether targeting HAT1 is a viable anticancer treatment strategy, we sought to identify small-molecule inhibitors of HAT1 by developing a high-throughput HAT1 acetyl-click assay. Screening of small-molecule libraries led to the discovery of multiple riboflavin analogs that inhibited HAT1 enzymatic activity. Compounds were refined by synthesis and testing of over 70 analogs, which yielded structure-activity relationships. The isoalloxazine core was required for enzymatic inhibition, whereas modifications of the ribityl side chain improved enzymatic potency and cellular growth suppression. One compound (JG-2016 [24a]) showed relative specificity toward HAT1 compared to other acetyltransferases, suppressed the growth of human cancer cell lines, impaired enzymatic activity in cellulo, and interfered with tumor growth. This is the first report of a small-molecule inhibitor of the HAT1 enzyme complex and represents a step toward targeting this pathway for cancer therapy.

    View details for DOI 10.1021/acs.jmedchem.3c00039

    View details for PubMedID 37027002

  • Withdrawal of 'Precision Neoantigen Discovery Using Large-scale Immunopeptidomes and Composite Modeling of MHC Peptide Presentation'. Molecular & cellular proteomics : MCP Pyke, R. M., Mellacheruvu, D., Dea, S., Abbott, C. W., Zhang, S. V., Phillips, N. A., Harris, J., Bartha, G., Desai, S., McClory, R., West, J., Snyder, M. P., Chen, R., Boyle, S. M. 2023; 22 (4): 100511

    View details for DOI 10.1016/j.mcpro.2023.100511

    View details for PubMedID 37019059

  • Leveraging electronic health records to identify risk factors for recurrent pregnancy loss across two medical centers: a case-control study. Research square Roger, J., Xie, F., Costello, J., Tang, A., Liu, J., Oskotsky, T., Woldemariam, S., Kosti, I., Le, B., Snyder, M. P., Giudice, L. C., Torgerson, D., Shaw, G. M., Stevenson, D. K., Rajkovic, A., Glymour, M. M., Aghaeepour, N., Cakmak, H., Lathi, R. B., Sirota, M. 2023

    Abstract

    Recurrent pregnancy loss (RPL), defined as 2 or more pregnancy losses, affects 5-6% of ever-pregnant individuals. Approximately half of these cases have no identifiable explanation. To generate hypotheses about RPL etiologies, we implemented a case-control study comparing the history of over 1,600 diagnoses between RPL and live-birth patients, leveraging the University of California San Francisco (UCSF) and Stanford University electronic health record databases. In total, our study included 8,496 RPL (UCSF: 3,840, Stanford: 4,656) and 53,278 Control (UCSF: 17,259, Stanford: 36,019) patients. Menstrual abnormalities and infertility-associated diagnoses were significantly positively associated with RPL in both medical centers. Age-stratified analysis revealed that the majority of RPL-associated diagnoses had higher odds ratios for patients <35 compared with 35+ patients. While Stanford results were sensitive to control for healthcare utilization, UCSF results were stable across analyses with and without utilization. Intersecting significant results between medical centers was an effective filter to identify associations that are robust across center-specific utilization patterns.

    View details for DOI 10.21203/rs.3.rs-2631220/v1

    View details for PubMedID 36993325

    View details for PubMedCentralID PMC10055527

  • The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell Rozowsky, J., Gao, J., Borsari, B., Yang, Y. T., Galeev, T., Gürsoy, G., Epstein, C. B., Xiong, K., Xu, J., Li, T., Liu, J., Yu, K., Berthel, A., Chen, Z., Navarro, F., Sun, M. S., Wright, J., Chang, J., Cameron, C. J., Shoresh, N., Gaskell, E., Drenkow, J., Adrian, J., Aganezov, S., Aguet, F., Balderrama-Gutierrez, G., Banskota, S., Corona, G. B., Chee, S., Chhetri, S. B., Cortez Martins, G. C., Danyko, C., Davis, C. A., Farid, D., Farrell, N. P., Gabdank, I., Gofin, Y., Gorkin, D. U., Gu, M., Hecht, V., Hitz, B. C., Issner, R., Jiang, Y., Kirsche, M., Kong, X., Lam, B. R., Li, S., Li, B., Li, X., Lin, K. Z., Luo, R., Mackiewicz, M., Meng, R., Moore, J. E., Mudge, J., Nelson, N., Nusbaum, C., Popov, I., Pratt, H. E., Qiu, Y., Ramakrishnan, S., Raymond, J., Salichos, L., Scavelli, A., Schreiber, J. M., Sedlazeck, F. J., See, L. H., Sherman, R. M., Shi, X., Shi, M., Sloan, C. A., Strattan, J. S., Tan, Z., Tanaka, F. Y., Vlasova, A., Wang, J., Werner, J., Williams, B., Xu, M., Yan, C., Yu, L., Zaleski, C., Zhang, J., Ardlie, K., Cherry, J. M., Mendenhall, E. M., Noble, W. S., Weng, Z., Levine, M. E., Dobin, A., Wold, B., Mortazavi, A., Ren, B., Gillis, J., Myers, R. M., Snyder, M. P., Choudhary, J., Milosavljevic, A., Schatz, M. C., Bernstein, B. E., Guigó, R., Gingeras, T. R., Gerstein, M. 2023; 186 (7): 1493-1511.e40

    Abstract

    Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.

    View details for DOI 10.1016/j.cell.2023.02.018

    View details for PubMedID 37001506

  • Advances and potential of omics studies for understanding the development of food allergy. Frontiers in allergy Sindher, S. B., Chin, A. R., Aghaeepour, N., Prince, L., Maecker, H., Shaw, G. M., Stevenson, D. K., Nadeau, K. C., Snyder, M., Khatri, P., Boyd, S. D., Winn, V. D., Angst, M. S., Chinthrajah, R. S. 2023; 4: 1149008

    Abstract

    The prevalence of food allergy continues to rise globally, carrying with it substantial safety, economic, and emotional burdens. Although preventative strategies do exist, the heterogeneity of allergy trajectories and clinical phenotypes has made it difficult to identify patients who would benefit from these strategies. Therefore, further studies investigating the molecular mechanisms that differentiate these trajectories are needed. Large-scale omics studies have identified key insights into the molecular mechanisms for many different diseases, however the application of these technologies to uncover the drivers of food allergy development is in its infancy. Here we review the use of omics approaches in food allergy and highlight key gaps in knowledge for applying these technologies for the characterization of food allergy development.

    View details for DOI 10.3389/falgy.2023.1149008

    View details for PubMedID 37034151

    View details for PubMedCentralID PMC10080041

  • Proinflammatory polarization of monocytes by particulate air pollutants is mediated by induction of trained immunity in pediatric asthma. Allergy Movassagh, H., Prunicki, M., Kaushik, A., Zhou, X., Dunham, D., Smith, E. M., He, Z., Aleman Muench, G. R., Shi, M., Weimer, A. K., Cao, S., Andorf, S., Feizi, A., Snyder, M. P., Soroosh, P., Mellins, E. D., Nadeau, K. C. 2023

    Abstract

    The impact of exposure to air pollutants, such as fine particulate matter (PM), on the immune system and its consequences on pediatric asthma, are not well understood. We investigated whether ambient levels of fine PM with aerodynamic diameter ≤2.5 microns (PM2.5 ) are associated with alterations in circulating monocytes in children with or without asthma.Monocyte phenotyping was performed by cytometry time-of-flight (CyTOF). Cytokines were measured using cytomtric bead array and Luminex assay. ChIP-Seq was utilized to address histone modifications in monocytes.Increased exposure to ambient PM2.5 was linked to specific monocyte subtypes, particularly in children with asthma. Mechanistically, we hypothesized that innate trained immunity is evoked by a primary exposure to fine PM and accounts for an enhanced inflammatory response after secondary stimulation in vitro. We determined that the trained immunity was induced in circulating monocytes by fine particulate pollutants, and it was characterized by the upregulation of proinflammatory mediators, such as TNF, IL-6, and IL-8, upon stimulation with house dust mite or lipopolysaccharide. This phenotype was epigenetically controlled by enhanced H3K27ac marks in circulating monocytes.The specific alterations of monocytes after ambient pollution exposure suggest a possible prognostic immune signature for pediatric asthma, and pollution-induced trained immunity may provide a potential therapeutic target for asthmatic children living in areas with increased air pollution.

    View details for DOI 10.1111/all.15692

    View details for PubMedID 36929161

  • Simultaneous profiling of host expression and microbial abundance by spatial metatranscriptome sequencing GENOME RESEARCH Lyu, L., Li, X., Feng, R., Zhou, X., Guha, T. K., Yu, X., Chen, G., Yao, Y., Su, B., Zou, D., Snyder, M. P., Chen, L. 2023; 33 (3): 401-411
  • Biomonitoring and precision health in deep space supported by artificial intelligence NATURE MACHINE INTELLIGENCE Scott, R. T., Sanders, L. M., Antonsen, E. L., Hastings, J. A., Park, S., Mackintosh, G., Reynolds, R. J., Hoarfrost, A. L., Sawyer, A., Greene, C. S., Glicksberg, B. S., Theriot, C. A., Berrios, D. C., Miller, J., Babdor, J., Barker, R., Baranzini, S. E., Beheshti, A., Chalk, S., Delgado-Aparicio, G. M., Haendel, M., Hamid, A. A., Heller, P., Jamieson, D., Jarvis, K. J., Kalantari, J., Khezeli, K., Komarova, S. V., Komorowski, M., Kothiyal, P., Mahabal, A., Manor, U., Martin, H., Mason, C. E., Matar, M., Mias, G. I., Myers Jr, J. G., Nelson, C., Oribello, J., Parsons-Wingerter, P., Prabhu, R. K., Qutub, A., Rask, J., Saravia-Butler, A., Saria, S., Singh, N., Snyder, M., Soboczenski, F., Soman, K., Van Valen, D., Venkateswaran, K., Warren, L., Worthey, L., Yang, J. H., Zitnik, M., Costes, S. V. 2023; 5 (3): 196-207
  • Biological research and self-driving labs in deep space supported by artificial intelligence NATURE MACHINE INTELLIGENCE Sanders, L. M., Scott, R. T., Yang, J. H., Qutub, A., Garcia Martin, H., Berrios, D. C., Hastings, J. A., Rask, J., Mackintosh, G., Hoarfrost, A. L., Chalk, S., Kalantari, J., Khezeli, K., Antonsen, E. L., Babdor, J., Barker, R., Baranzini, S. E., Beheshti, A., Delgado-Aparicio, G. M., Glicksberg, B. S., Greene, C. S., Haendel, M., Hamid, A. A., Heller, P., Jamieson, D., Jarvis, K. J., Komarova, S. V., Komorowski, M., Kothiyal, P., Mahabal, A., Manor, U., Mason, C. E., Matar, M., Mias, G. I., Miller, J., Myers Jr, J. G., Nelson, C., Oribello, J., Park, S., Parsons-Wingerter, P., Prabhu, R. K., Reynolds, R. J., Saravia-Butler, A., Saria, S., Sawyer, A., Singh, N., Snyder, M., Soboczenski, F., Soman, K., Theriot, C. A., Van Valen, D., Venkateswaran, K., Warren, L., Worthey, L., Zitnik, M., Costes, S. V. 2023; 5 (3): 208-219
  • Sensor-enabled Multilayer Artificial Intelligence Analysis for Predictive Wound Healing and Real-Time Patient Monitoring Trotsyuk, A. A., Jing, S., Chen, K., Henn, D., Jiang, Y., Niu, S., Sivaraj, D., Nag, R., Snyder, M., bao, Z., Gurtner, G. C. WILEY. 2023: 268-269
  • Simultaneous profiling of host expression and microbial abundance by spatial metatranscriptome sequencing. Genome research Lyu, L., Li, X., Feng, R., Zhou, X., Guha, T. K., Yu, X., Chen, G. Q., Yao, Y., Su, B., Zou, D., Snyder, M. P., Chen, L. 2023; 33 (3): 401-411

    Abstract

    We developed an analysis pipeline that can extract microbial sequences from spatial transcriptomic (ST) data and assign taxonomic labels, generating a spatial microbial abundance matrix in addition to the default host expression matrix, enabling simultaneous analysis of host expression and microbial distribution. We called the pipeline spatial metatranscriptome (SMT) and applied it on both human and murine intestinal sections and validated the spatial microbial abundance information with alternative assays. Biological insights were gained from these novel data that showed host-microbe interaction at various spatial scales. Finally, we tested experimental modification that can increase microbial capture while preserving host spatial expression quality and, by use of positive controls, quantitatively showed the capture efficiency and recall of our methods. This proof-of-concept work shows the feasibility of SMT analysis and paves the way for further experimental optimization and application.

    View details for DOI 10.1101/gr.277178.122

    View details for PubMedID 37310927

  • Precision neoantigen discovery using large-scale immunopeptidomes and composite modeling of MHC peptide presentation. Molecular & cellular proteomics : MCP Pyke, R. M., Mellacheruvu, D., Dea, S., Abbott, C., Zhang, S. V., Phillips, N. A., Harris, J., Bartha, G., Desai, S., McClory, R., West, J., Snyder, M. P., Chen, R., Boyle, S. M. 2023: 100506

    Abstract

    Major histocompatibility complex (MHC)-bound peptides that originate from tumor-specific genetic alterations, known as neoantigens, are an important class of anti-cancer therapeutic targets. Accurately predicting peptide presentation by MHC complexes is a key aspect of discovering therapeutically relevant neoantigens. Technological improvements in mass-spectrometry-based immunopeptidomics and advanced modeling techniques have vastly improved MHC presentation prediction over the past two decades. However, improvement in the sensitivity and specificity of prediction algorithms is needed for clinical applications such as the development of personalized cancer vaccines, the discovery of biomarkers for response to checkpoint blockade and the quantification of autoimmune risk in gene therapies. Toward this end, we generated allele-specific immunopeptidomics data using 25 mono-allelic cell lines and created Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), a pan-allelic MHC-peptide algorithm for predicting MHC-peptide binding and presentation. In contrast to previously published large-scale mono-allelic data, we used an HLA-null K562 parental cell line and a stable transfection of HLA alleles to better emulate native presentation. Our dataset includes five previously unprofiled alleles that expand MHC binding pocket diversity in the training data and extend allelic coverage in underprofiled populations. To improve generalizability, SHERPA systematically integrates 128 mono-allelic and 384 multi-allelic samples with publicly available immunoproteomics data and binding assay data. Using this dataset, we developed two features that empirically estimate the propensities of genes and specific regions within gene bodies to engender immunopeptides to represent antigen processing. Using a composite model constructed with gradient boosting decision trees, multi-allelic deconvolution and 2.15 million peptides encompassing 167 alleles, we achieved a 1.44 fold improvement of positive predictive value compared to existing tools when evaluated on independent mono-allelic datasets and a 1.17 fold improvement when evaluating on tumor samples. With a high degree of accuracy, SHERPA has the potential to enable precision neoantigen discovery for future clinical applications.

    View details for DOI 10.1016/j.mcpro.2023.100506

    View details for PubMedID 36796642

  • Challenging obesity and sex based differences in resting energy expenditure using allometric modeling, a sub-study of the DIETFITS clinical trial. Clinical nutrition ESPEN Haddad, F., Li, X., Perelman, D., Santana, E. J., Kuznetsova, T., Cauwenberghs, N., Busque, V., Contrepois, K., Snyder, M. P., Leonard, M. B., Gardner, C. 2023; 53: 43-52

    Abstract

    BACKGROUND & AIMS: Resting energy expenditure (REE) is a major component of energy balance. While REE is usually indexed to total body weight (BW), this may introduce biases when assessing REE in obesity or during weight loss intervention. The main objective of the study was to quantify the bias introduced by ratiometric scaling of REE using BW both at baseline and following weight loss intervention.DESIGN: Participants in the DIETFITS Study (Diet Intervention Examining The Factors Interacting with Treatment Success) who completed indirect calorimetry and dual-energy X-ray absorptiometry (DXA) were included in the study. Data were available in 438 participants at baseline, 340at 6 months and 323at 12 months. We used multiplicative allometric modeling based on lean body mass (LBM) and fat mass (FM) to derive body size independent scaling of REE. Longitudinal changes in indexed REE were then assessed following weight loss intervention.RESULTS: A multiplicative model including LBM, FM, age, Black race and the double product (DP) of systolic blood pressure and heart rate explained 79% of variance in REE. REE indexed to [LBM0.66*FM0.066] was body size and sex independent (p=0.91 and p=0.73, respectively) in contrast to BW based indexing which showed a significant inverse relationship to BW (r=-0.47 for female and r=-0.44 for male, both p<0.001). When indexed to BW, significant baseline differences in REE were observed between male and female (p<0.001) and between individuals who are overweight and obese (p<0.001) while no significant differences were observed when indexed to REE/[LBM0.66*FM0.066], p>0.05). Percentage predicted REE adjusted for LBM, FM and DP remained stable following weight loss intervention (p=0.614).CONCLUSION: Allometric scaling of REE based on LBM and FM removes body composition-associated biases and should be considered in obesity and weight-based intervention studies.

    View details for DOI 10.1016/j.clnesp.2022.11.015

    View details for PubMedID 36657929

  • Stem cell plasticity, acetylation of H3K14, and de novo gene activation rely on KAT7. Cell reports Kueh, A. J., Bergamasco, M. I., Quaglieri, A., Phipson, B., Li-Wai-Suen, C. S., Lonnstedt, I. M., Hu, Y., Feng, Z., Woodruff, C., May, R. E., Wilcox, S., Garnham, A. L., Snyder, M. P., Smyth, G. K., Speed, T. P., Thomas, T., Voss, A. K. 2023; 42 (1): 111980

    Abstract

    In the conventional model of transcriptional activation, transcription factors bind to response elements and recruit co-factors, including histone acetyltransferases. Contrary to this model, we show that the histone acetyltransferase KAT7 (HBO1/MYST2) is required genome wide for histone H3 lysine 14 acetylation (H3K14ac). Examining neural stem cells, we find that KAT7 and H3K14ac are present not only at transcribed genes but also at inactive genes, intergenic regions, and in heterochromatin. KAT7 and H3K14ac were not required for the continued transcription of genes that were actively transcribed at the time of loss of KAT7 but indispensable for the activation of repressed genes. The absence of KAT7 abrogates neural stem cell plasticity, diverse differentiation pathways, and cerebral cortex development. Re-expression of KAT7 restored stem cell developmental potential. Overexpression of KAT7 enhanced neuron and oligodendrocyte differentiation. Our data suggest that KAT7 prepares chromatin for transcriptional activation and is a prerequisite for gene activation.

    View details for DOI 10.1016/j.celrep.2022.111980

    View details for PubMedID 36641753

  • Multiomic identification of key transcriptional regulatory programs during endurance exercise training. bioRxiv : the preprint server for biology Smith, G. R., Zhao, B., Lindholm, M. E., Raja, A., Viggars, M., Pincas, H., Gay, N. R., Sun, Y., Ge, Y., Nair, V. D., Sanford, J. A., S Amper, M. A., Vasoya, M., Smith, K. S., Montgomery, S., Zaslavsky, E., Bodine, S. C., Esser, K. A., Walsh, M. J., Snyder, M. P., Sealfon, S. C., MoTrPAC Study Group 2023

    Abstract

    Transcription factors (TFs) play a key role in regulating gene expression and responses to stimuli. We conducted an integrated analysis of chromatin accessibility and RNA expression across various rat tissues following endurance exercise training (EET) to map epigenomic changes to transcriptional changes and determine key TFs involved. We uncovered tissue-specific changes across both omic layers, including highly correlated differentially accessible regions (DARs) and differentially expressed genes (DEGs). We identified open chromatin regions associated with DEGs (DEGaPs) and found tissue-specific and genomic feature-specific TF motif enrichment patterns among both DARs and DEGaPs. Accessible promoters of up-vs. down-regulated DEGs per tissue showed distinct TF enrichment patterns. Further, some EET-induced TFs in skeletal muscle were either validated at the proteomic level (MEF2C and NUR77) or correlated with exercise-related phenotypic changes. We provide an in-depth analysis of the epigenetic and trans-factor-dependent processes governing gene expression during EET.

    View details for DOI 10.1101/2023.01.10.523450

    View details for PubMedID 36711841

  • Low expression of EXOSC2 protects against clinical COVID-19 and impedes SARS-CoV-2 replication. Life science alliance Moll, T., Odon, V., Harvey, C., Collins, M. O., Peden, A., Franklin, J., Graves, E., Marshall, J. N., Dos Santos Souza, C., Zhang, S., Castelli, L., Hautbergue, G., Azzouz, M., Gordon, D., Krogan, N., Ferraiuolo, L., Snyder, M. P., Shaw, P. J., Rehwinkel, J., Cooper-Knock, J. 2023; 6 (1)

    Abstract

    New therapeutic targets are a valuable resource for treatment of SARS-CoV-2 viral infection. Genome-wide association studies have identified risk loci associated with COVID-19, but many loci are associated with comorbidities and are not specific to host-virus interactions. Here, we identify and experimentally validate a link between reduced expression of EXOSC2 and reduced SARS-CoV-2 replication. EXOSC2 was one of the 332 host proteins examined, all of which interact directly with SARS-CoV-2 proteins. Aggregating COVID-19 genome-wide association studies statistics for gene-specific eQTLs revealed an association between increased expression of EXOSC2 and higher risk of clinical COVID-19. EXOSC2 interacts with Nsp8 which forms part of the viral RNA polymerase. EXOSC2 is a component of the RNA exosome, and here, LC-MS/MS analysis of protein pulldowns demonstrated interaction between the SARS-CoV-2 RNA polymerase and most of the human RNA exosome components. CRISPR/Cas9 introduction of nonsense mutations within EXOSC2 in Calu-3 cells reduced EXOSC2 protein expression and impeded SARS-CoV-2 replication without impacting cellular viability. Targeted depletion of EXOSC2 may be a safe and effective strategy to protect against clinical COVID-19.

    View details for DOI 10.26508/lsa.202201449

    View details for PubMedID 36241425

  • Harnessing human genetics and stem cells for precision cardiovascular medicine Cell Genomics Caudal, A., Snyder, M. P., Wu, J. C. 2023
  • Leveraging Mobile Technology for Public Health Promotion: A Multidisciplinary Perspective. Annual review of public health Hicks, J. L., Boswell, M. A., Althoff, T., Crum, A. J., Ku, J. P., Landay, J. A., Moya, P. M., Murnane, E. L., Snyder, M. P., King, A. C., Delp, S. L. 2022

    Abstract

    Health behaviors are inextricably linked to health and well-being, yet issues such as physical inactivity and insufficient sleep remain significant global public health problems. Mobile technology-and the unprecedented scope and quantity of data it generates-has a promising but largely untapped potential to promote health behaviors at the individual and population levels. This perspective article provides multidisciplinary recommendations on the design and use of mobile technology, and the concomitant wealth of data, to promote behaviors that support overall health. Using physical activity as an exemplar health behavior, we review emerging strategies for health behavior change interventions. We describe progress on personalizing interventions to an individual and their social, cultural, and built environments, as well as on evaluating relationships between mobile technology data and health to establish evidence-based guidelines. In reviewing these strategies and highlighting directions for future research, we advance the use of theory-based, personalized, and human-centered approaches in promoting health behaviors. Expected final online publication date for the Annual Review of Public Health, Volume 44 is April 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

    View details for DOI 10.1146/annurev-publhealth-060220-041643

    View details for PubMedID 36542772

  • Gut microbiota analyses of Saudi populations for type 2 diabetes-related phenotypes reveals significant association. BMC microbiology Al-Muhanna, F. A., Dowdell, A. K., Al Eleq, A. H., Albaker, W. I., Brooks, A. W., Al-Sultan, A. I., Al-Rubaish, A. M., Alkharsah, K. R., Sulaiman, R. M., Al-Quorain, A. A., Cyrus, C., Alali, R. A., Vatte, C., Robinson, F. L., Zhou, X., Snyder, M. P., Almuhanna, A. F., Keating, B. J., Piening, B. D., Al-Ali, A. K. 2022; 22 (1): 301

    Abstract

    Large-scale gut microbiome sequencing has revealed key links between microbiome dysfunction and metabolic diseases such as type 2 diabetes (T2D). To date, these efforts have largely focused on Western populations, with few studies assessing T2D microbiota associations in Middle Eastern communities where T2D prevalence is now over 20%. We analyzed the composition of stool 16S rRNA from 461 T2D and 119 non-T2D participants from the Eastern Province of Saudi Arabia. We quantified the abundance of microbial communities to examine any significant differences between subpopulations of samples based on diabetes status and glucose level.In this study we performed the largest microbiome study ever conducted in Saudi Arabia, as well as the first-ever characterization of gut microbiota T2D versus non-T2D in this population. We observed overall positive enrichment within diabetics compared to healthy individuals and amongst diabetic participants; those with high glucose levels exhibited slightly more positive enrichment compared to those at lower risk of fasting hyperglycemia. In particular, the genus Firmicutes was upregulated in diabetic individuals compared to non-diabetic individuals, and T2D was associated with an elevated Firmicutes/Bacteroidetes ratio, consistent with previous findings.Based on diabetes status and glucose levels of Saudi participants, relatively stable differences in stool composition were perceived by differential abundance and alpha diversity measures. However, community level differences are evident in the Saudi population between T2D and non-T2D individuals, and diversity patterns appear to vary from well-characterized microbiota from Western cohorts. Comparing overlapping and varying patterns in gut microbiota with other studies is critical to assessing novel treatment options in light of a rapidly growing T2D health epidemic in the region. As a rapidly emerging chronic condition in Saudi Arabia and the Middle East, T2D burdens have grown more quickly and affect larger proportions of the population than any other global region, making a regional reference T2D-microbiome dataset critical to understanding the nuances of disease development on a global scale.

    View details for DOI 10.1186/s12866-022-02714-8

    View details for PubMedID 36510121

  • Early prediction and longitudinal modeling of preeclampsia from multiomics. Patterns (New York, N.Y.) Maric, I., Contrepois, K., Moufarrej, M. N., Stelzer, I. A., Feyaerts, D., Han, X., Tang, A., Stanley, N., Wong, R. J., Traber, G. M., Ellenberger, M., Chang, A. L., Fallahzadeh, R., Nassar, H., Becker, M., Xenochristou, M., Espinosa, C., De Francesco, D., Ghaemi, M. S., Costello, E. K., Culos, A., Ling, X. B., Sylvester, K. G., Darmstadt, G. L., Winn, V. D., Shaw, G. M., Relman, D. A., Quake, S. R., Angst, M. S., Snyder, M. P., Stevenson, D. K., Gaudilliere, B., Aghaeepour, N. 2022; 3 (12): 100655

    Abstract

    Preeclampsia is a complex disease of pregnancy whose physiopathology remains unclear. We developed machine-learning models for early prediction of preeclampsia (first 16weeks of pregnancy) and over gestation by analyzing six omics datasets from a longitudinal cohort of pregnant women. For early pregnancy, a prediction model using nine urine metabolites had the highest accuracy and was validated on an independent cohort (area under the receiver-operating characteristic curve [AUC]= 0.88, 95% confidence interval [CI] [0.76, 0.99] cross-validated; AUC= 0.83, 95% CI [0.62,1] validated). Univariate analysis demonstrated statistical significance of identified metabolites. An integrated multiomics model further improved accuracy (AUC= 0.94). Several biological pathways were identified including tryptophan, caffeine, and arachidonic acid metabolisms. Integration with immune cytometry data suggested novel associations between immune and proteomic dynamics. While further validation in a larger population is necessary, these encouraging results can serve as a basis for a simple, early diagnostic test for preeclampsia.

    View details for DOI 10.1016/j.patter.2022.100655

    View details for PubMedID 36569558

  • Wireless, closed-loop, smart bandage with integrated sensors and stimulators for advanced wound care and accelerated healing. Nature biotechnology Jiang, Y., Trotsyuk, A. A., Niu, S., Henn, D., Chen, K., Shih, C. C., Larson, M. R., Mermin-Bunnell, A. M., Mittal, S., Lai, J. C., Saberi, A., Beard, E., Jing, S., Zhong, D., Steele, S. R., Sun, K., Jain, T., Zhao, E., Neimeth, C. R., Viana, W. G., Tang, J., Sivaraj, D., Padmanabhan, J., Rodrigues, M., Perrault, D. P., Chattopadhyay, A., Maan, Z. N., Leeolou, M. C., Bonham, C. A., Kwon, S. H., Kussie, H. C., Fischer, K. S., Gurusankar, G., Liang, K., Zhang, K., Nag, R., Snyder, M. P., Januszyk, M., Gurtner, G. C., Bao, Z. 2022

    Abstract

    'Smart' bandages based on multimodal wearable devices could enable real-time physiological monitoring and active intervention to promote healing of chronic wounds. However, there has been limited development in incorporation of both sensors and stimulators for the current smart bandage technologies. Additionally, while adhesive electrodes are essential for robust signal transduction, detachment of existing adhesive dressings can lead to secondary damage to delicate wound tissues without switchable adhesion. Here we overcome these issues by developing a flexible bioelectronic system consisting of wirelessly powered, closed-loop sensing and stimulation circuits with skin-interfacing hydrogel electrodes capable of on-demand adhesion and detachment. In mice, we demonstrate that our wound care system can continuously monitor skin impedance and temperature and deliver electrical stimulation in response to the wound environment. Across preclinical wound models, the treatment group healed ~25% more rapidly and with ~50% enhancement in dermal remodeling compared with control. Further, we observed activation of proregenerative genes in monocyte and macrophage cell populations, which may enhance tissue regeneration, neovascularization and dermal recovery.

    View details for DOI 10.1038/s41587-022-01528-3

    View details for PubMedID 36424488

    View details for PubMedCentralID 5350204

  • Author Correction: Prediction of gestational age using urinary metabolites in term and preterm pregnancies. Scientific reports Contrepois, K., Chen, S., Ghaemi, M. S., Wong, R. J., Jehan, F., Sazawal, S., Baqui, A. H., Stringer, J. S., Rahman, A., Nisar, M. I., Dhingra, U., Khanam, R., Ilyas, M., Dutta, A., Mehmood, U., Deb, S., Hotwani, A., Ali, S. M., Rahman, S., Nizar, A., Ame, S. M., Muhammad, S., Chauhan, A., Khan, W., Raqib, R., Das, S., Ahmed, S., Hasan, T., Khalid, J., Juma, M. H., Chowdhury, N. H., Kabir, F., Aftab, F., Quaiyum, A., Manu, A., Yoshida, S., Bahl, R., Pervin, J., Price, J. T., Rahman, M., Kasaro, M. P., Litch, J. A., Musonda, P., Vwalika, B., Shaw, G., Stevenson, D. K., Aghaeepour, N., Snyder, M. P. 2022; 12 (1): 19753

    View details for DOI 10.1038/s41598-022-23715-7

    View details for PubMedID 36396676

  • Annotation of spatially resolved single-cell data with STELLAR. Nature methods Brbic, M., Cao, K., Hickey, J. W., Tan, Y., Snyder, M. P., Nolan, G. P., Leskovec, J. 2022

    Abstract

    Accurate cell-type annotation from spatially resolved single cells is crucial to understand functional spatial biology that is the basis of tissue organization. However, current computational methods for annotating spatially resolved single-cell data are typically based on techniques established for dissociated single-cell technologies and thus do not take spatial organization into account. Here we present STELLAR, a geometric deep learning method for cell-type discovery and identification in spatially resolved single-cell datasets. STELLAR automatically assigns cells to cell types present in the annotated reference dataset and discovers novel cell types and cell states. STELLAR transfers annotations across different dissection regions, different tissues and different donors, and learns cell representations that capture higher-order tissue structures. We successfully applied STELLAR to CODEX multiplexed fluorescent microscopy data and multiplexed RNA imaging datasets. Within the Human BioMolecular Atlas Program, STELLAR has annotated 2.6million spatially resolved single cells with dramatic time savings.

    View details for DOI 10.1038/s41592-022-01651-8

    View details for PubMedID 36280720

  • The metabolomics of human aging: Advances, challenges, and opportunities. Science advances Panyard, D. J., Yu, B., Snyder, M. P. 2022; 8 (42): eadd6155

    Abstract

    As the global population becomes older, understanding the impact of aging on health and disease becomes paramount. Recent advancements in multiomic technology have allowed for the high-throughput molecular characterization of aging at the population level. Metabolomics studies that analyze the small molecules in the body can provide biological information across a diversity of aging processes. Here, we review the growing body of population-scale metabolomics research on aging in humans, identifying the major trends in the field, implicated biological pathways, and how these pathways relate to health and aging. We conclude by assessing the main challenges in the research to date, opportunities for advancing the field, and the outlook for precision health applications.

    View details for DOI 10.1126/sciadv.add6155

    View details for PubMedID 36260671

  • LEVERAGING ELECTRONIC HEALTH RECORD DATA TO IDENTIFY PHENOTYPES ASSOCIATED WITH PREGNANCY LOSS MAY LEAD TO IMPROVED UNDERSTANDING OF RECURRENT PREGNANCY LOSS Roger, J., Tang, A., Woldemariam, S., Oskotsky, T., Wen, T., Liu, J., Kosti, I., Le, B., Cakmak, H., Snyder, M., Aghaeepour, N., Shaw, G., Stevenson, D., Giudice, L. C., Glymour, M., Rajkovic, A., Lathi, R., Sirota, M. ELSEVIER SCIENCE INC. 2022: E107
  • Precision Medicine Approaches to Mental Healthcare. Physiology (Bethesda, Md.) Scala, J. J., Ganz, A. B., Snyder, M. P. 2022

    Abstract

    By developing a more comprehensive understanding of the physiological underpinnings of mental illness, precision medicine has the potential to revolutionize psychiatric care. With recent breakthroughs in next-generation multi-omics technologies and data analytics, it is becoming more feasible to leverage multimodal biomarkers, from genetic variants to neuroimaging biomarkers, to objectify diagnostics and treatment decisions in psychiatry and improve patient outcomes. Ongoing work in precision psychiatry will parallel progress in precision oncology and cardiology to develop an expanded suite of blood- and neuroimaging-based diagnostic tests, empower monitoring of treatment efficacy over time, and reduce patient exposure to ineffective treatments. The emerging model of precision psychiatry has the potential to mitigate some of psychiatry's most pressing issues, including improvingdisease classification, lengthy treatment duration, and suboptimal treatment outcomes. This narrative-style review summarizes some of the emerging breakthroughs and recurring challenges in the application of precision medicine approaches to mental healthcare.

    View details for DOI 10.1152/physiol.00013.2022

    View details for PubMedID 36099270

  • A method for intelligent allocation of diagnostic testing by leveraging data from commercial wearable devices: a case study on COVID-19. NPJ digital medicine Shandhi, M. M., Cho, P. J., Roghanizad, A. R., Singh, K., Wang, W., Enache, O. M., Stern, A., Sbahi, R., Tatar, B., Fiscus, S., Khoo, Q. X., Kuo, Y., Lu, X., Hsieh, J., Kalodzitsa, A., Bahmani, A., Alavi, A., Ray, U., Snyder, M. P., Ginsburg, G. S., Pasquale, D. K., Woods, C. W., Shaw, R. J., Dunn, J. P. 2022; 5 (1): 130

    Abstract

    Mass surveillance testing can help control outbreaks of infectious diseases such as COVID-19. However, diagnostic test shortages are prevalent globally and continue to occur in the US with the onset of new COVID-19 variants and emerging diseases like monkeypox, demonstrating an unprecedented need for improving our current methods for mass surveillance testing. By targeting surveillance testing toward individuals who are most likely to be infected and, thus, increasing the testing positivity rate (i.e., percent positive in the surveillance group), fewer tests are needed to capture the same number of positive cases. Here, we developed an Intelligent Testing Allocation (ITA) method by leveraging data from the CovIdentify study (6765 participants) and the MyPHD study (8580 participants), including smartwatch data from 1265 individuals of whom 126 tested positive for COVID-19. Our rigorous model and parameter search uncovered the optimal time periods and aggregate metrics for monitoring continuous digital biomarkers to increase the positivity rate of COVID-19 diagnostic testing. We found that resting heart rate (RHR) features distinguished between COVID-19-positive and -negative cases earlier in the course of the infection than steps features, as early as 10 and 5 days prior to the diagnostic test, respectively. We also found that including steps features increased the area under the receiver operating characteristic curve (AUC-ROC) by 7-11% when compared with RHR features alone, while including RHR features improved the AUC of the ITA model's precision-recall curve (AUC-PR) by 38-50% when compared with steps features alone. The best AUC-ROC (0.73±0.14 and 0.77 on the cross-validated training set and independent test set, respectively) and AUC-PR (0.55±0.21 and 0.24) were achieved by using data from a single device type (Fitbit) with high-resolution (minute-level) data. Finally, we show that ITA generates up to a 6.5-fold increase in the positivity rate in the cross-validated training set and up to a 4.5-fold increase in the positivity rate in the independent test set, including both symptomatic and asymptomatic (up to 27%) individuals. Our findings suggest that, if deployed on a large scale and without needing self-reported symptoms, the ITA method could improve the allocation of diagnostic testing resources and reduce the burden of test shortages.

    View details for DOI 10.1038/s41746-022-00672-z

    View details for PubMedID 36050372

  • Deploying wearable sensors for pandemic mitigation: A counterfactual modelling study of Canada's second COVID-19 wave. PLOS digital health Duarte, N., Arora, R. K., Bennett, G., Wang, M., Snyder, M. P., Cooperstock, J. R., Wagner, C. E. 2022; 1 (9): e0000100

    Abstract

    Wearable sensors can continuously and passively detect potential respiratory infections before or absent symptoms. However, the population-level impact of deploying these devices during pandemics is unclear. We built a compartmental model of Canada's second COVID-19 wave and simulated wearable sensor deployment scenarios, systematically varying detection algorithm accuracy, uptake, and adherence. With current detection algorithms and 4% uptake, we observed a 16% reduction in the second wave burden of infection; however, 22% of this reduction was attributed to incorrectly quarantining uninfected device users. Improving detection specificity and offering confirmatory rapid tests each minimized unnecessary quarantines and lab-based tests. With a sufficiently low false positive rate, increasing uptake and adherence became effective strategies for scaling averted infections. We concluded that wearable sensors capable of detecting presymptomatic or asymptomatic infections have potential to help reduce the burden of infection during a pandemic; in the case of COVID-19, technology improvements or supporting measures are required to keep social and resource costs sustainable.

    View details for DOI 10.1371/journal.pdig.0000100

    View details for PubMedID 36812624

  • KLF4 recruits SWI/SNF to increase chromatin accessibility and reprogram the endothelial enhancer landscape under laminar shear stress. Nature communications Moonen, J. R., Chappell, J., Shi, M., Shinohara, T., Li, D., Mumbach, M. R., Zhang, F., Nair, R. V., Nasser, J., Mai, D. H., Taylor, S., Wang, L., Metzger, R. J., Chang, H. Y., Engreitz, J. M., Snyder, M. P., Rabinovitch, M. 2022; 13 (1): 4941

    Abstract

    Physiologic laminar shear stress (LSS) induces an endothelial gene expression profile that is vasculo-protective. In this report, we delineate how LSS mediates changes in the epigenetic landscape to promote this beneficial response. We show that under LSS, KLF4 interacts with the SWI/SNF nucleosome remodeling complex to increase accessibility at enhancer sites that promote the expression of homeostatic endothelial genes. By combining molecular and computational approaches we discover enhancers that loop to promoters of KLF4- and LSS-responsive genes that stabilize endothelial cells and suppress inflammation, such as BMPR2, SMAD5, and DUSP5. By linking enhancers to genes that they regulate under physiologic LSS, our work establishes a foundation for interpreting how non-coding DNA variants in these regions might disrupt protective gene expression to influence vascular disease.

    View details for DOI 10.1038/s41467-022-32566-9

    View details for PubMedID 35999210

  • Deep learning-based pseudo-mass spectrometry imaging analysis for precision medicine. Briefings in bioinformatics Shen, X., Shao, W., Wang, C., Liang, L., Chen, S., Zhang, S., Rusu, M., Snyder, M. P. 2022

    Abstract

    Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics provides systematic profiling of metabolic. Yet, its applications in precision medicine (disease diagnosis) have been limited by several challenges, including metabolite identification, information loss and low reproducibility. Here, we present the deep-learning-based Pseudo-Mass Spectrometry Imaging (deepPseudoMSI) project (https://www.deeppseudomsi.org/), which converts LC-MS raw data to pseudo-MS images and then processes them by deep learning for precision medicine, such as disease diagnosis. Extensive tests based on real data demonstrated the superiority of deepPseudoMSI over traditional approaches and the capacity of our method to achieve an accurate individualized diagnosis. Our framework lays the foundation for future metabolic-based precision medicine.

    View details for DOI 10.1093/bib/bbac331

    View details for PubMedID 35947990

  • Transcriptome variation in human tissues revealed by long-read sequencing. Nature Glinos, D. A., Garborcauskas, G., Hoffman, P., Ehsan, N., Jiang, L., Gokden, A., Dai, X., Aguet, F., Brown, K. L., Garimella, K., Bowers, T., Costello, M., Ardlie, K., Jian, R., Tucker, N. R., Ellinor, P. T., Harrington, E. D., Tang, H., Snyder, M., Juul, S., Mohammadi, P., MacArthur, D. G., Lappalainen, T., Cummings, B. B. 2022

    Abstract

    Regulation of transcript structure generates transcript diversity and plays an important role in human disease1-7. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure8-16. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource. We identified just over 70,000 novel transcripts for annotated genes, and validated the protein expression of 10% of novel transcripts. We developed a new computational package, LORALS, to analyse the genetic effects of rare and common variants on the transcriptome by allele-specific analysis of long reads. We characterized allele-specific expression and transcript structure events, providing new insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb the transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we used this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.

    View details for DOI 10.1038/s41586-022-05035-y

    View details for PubMedID 35922509

  • Reply to 'Lactate as a major myokine and exerkine'. Nature reviews. Endocrinology Chow, L. S., Gerszten, R. E., Taylor, J. M., Pedersen, B. K., van Praag, H., Trappe, S., Febbraio, M. A., Galis, Z. S., Gao, Y., Haus, J. M., Lanza, I. R., Lavie, C. J., Lee, C., Lucia, A., Moro, C., Pandey, A., Robbins, J. M., Stanford, K. I., Thackray, A. E., Villeda, S., Watt, M. J., Xia, A., Zierath, J. R., Goodpaster, B. H., Snyder, M. 2022

    View details for DOI 10.1038/s41574-022-00726-y

    View details for PubMedID 35915255

  • DSIF modulates RNA polymerase II occupancy according to template G plus C content NAR GENOMICS AND BIOINFORMATICS Deng, N., Zhang, Y., Ma, Z., Lin, R., Cheng, T., Tang, H., Snyder, M. P., Cohen, S. N. 2022; 4 (3)
  • Robust Identification of Temporal Biomarkers in Longitudinal Omics Studies. Bioinformatics (Oxford, England) Metwally, A. A., Zhang, T., Wu, S., Kellogg, R., Zhou, W., Contrepois, K., Tang, H., Snyder, M. 2022

    Abstract

    Longitudinal studies increasingly collect rich 'omics' data sampled frequently over time and across large cohorts to capture dynamic health fluctuations and disease transitions. However, the generation of longitudinal omics data has preceded the development of analysis tools that can efficiently extract insights from such data. In particular, there is a need for statistical frameworks that can identify not only which omics features are differentially regulated between groups but also over what time intervals. Additionally, longitudinal omics data may have inconsistencies, including nonuniform sampling intervals, missing data points, subject dropout, and differing numbers of samples per subject.In this work, we developed OmicsLonDA, a statistical method that provides robust identification of time intervals of temporal omics biomarkers. OmicsLonDA is based on a semi-parametric approach, in which we use smoothing splines to model longitudinal data and infer significant time intervals of omics features based on an empirical distribution constructed through a permutation procedure. We benchmarked OmicsLonDA on five simulated datasets with diverse temporal patterns, and the method showed specificity greater than 0.99 and sensitivity greater than 0.87. Applying OmicsLonDA to the iPOP cohort revealed temporal patterns of genes, proteins, hormone metabolites, and microbes that are differentially regulated in male versus female subjects following a respiratory infection. In addition, we applied OmicsLonDA to the longitudinal multi-omics dataset of pregnant women with and without preeclampsia, and the method identified potential lipid markers that are temporally significantly different between the two groups.We provide an open-source R package (https://bioconductor.org/packages/OmicsLonDA), to enable widespread use.Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btac403

    View details for PubMedID 35762936

  • KMT2D-NOTCH Mediates Coronary Abnormalities in Hypoplastic Left Heart Syndrome. Circulation research Yu, Z., Zhou, X., Liu, Z., Pastrana-Gomez, V., Liu, Y., Guo, M., Tian, L., Nelson, T. J., Wang, N., Mital, S., Chitayat, D., Wu, J. C., Rabinovitch, M., Wu, S. M., Snyder, M. P., Miao, Y., Gu, M. 2022: 101161CIRCRESAHA122320783

    View details for DOI 10.1161/CIRCRESAHA.122.320783

    View details for PubMedID 35762338

  • Serine biosynthesis as a novel therapeutic target for dilated cardiomyopathy. European heart journal Perea-Gil, I., Seeger, T., Bruyneel, A. A., Termglinchan, V., Monte, E., Lim, E. W., Vadgama, N., Furihata, T., Gavidia, A. A., Arthur Ataam, J., Bharucha, N., Martinez-Amador, N., Ameen, M., Nair, P., Serrano, R., Kaur, B., Feyen, D. A., Diecke, S., Snyder, M. P., Metallo, C. M., Mercola, M., Karakikes, I. 2022

    Abstract

    AIMS: Genetic dilated cardiomyopathy (DCM) is a leading cause of heart failure. Despite significant progress in understanding the genetic aetiologies of DCM, the molecular mechanisms underlying the pathogenesis of familial DCM remain unknown, translating to a lack of disease-specific therapies. The discovery of novel targets for the treatment of DCM was sought using phenotypic sceening assays in induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) that recapitulate the disease phenotypes in vitro.METHODS AND RESULTS: Using patient-specific iPSCs carrying a pathogenic TNNT2 gene mutation (p.R183W) and CRISPR-based genome editing, a faithful DCM model in vitro was developed. An unbiased phenotypic screening in TNNT2 mutant iPSC-derived cardiomyocytes (iPSC-CMs) with small molecule kinase inhibitors (SMKIs) was performed to identify novel therapeutic targets. Two SMKIs, Go 6976 and SB 203580, were discovered whose combinatorial treatment rescued contractile dysfunction in DCM iPSC-CMs carrying gene mutations of various ontologies (TNNT2, TTN, LMNA, PLN, TPM1, LAMA2). The combinatorial SMKI treatment upregulated the expression of genes that encode serine, glycine, and one-carbon metabolism enzymes and significantly increased the intracellular levels of glucose-derived serine and glycine in DCM iPSC-CMs. Furthermore, the treatment rescued the mitochondrial respiration defects and increased the levels of the tricarboxylic acid cycle metabolites and ATP in DCM iPSC-CMs. Finally, the rescue of the DCM phenotypes was mediated by the activating transcription factor 4 (ATF4) and its downstream effector genes, phosphoglycerate dehydrogenase (PHGDH), which encodes a critical enzyme of the serine biosynthesis pathway, and Tribbles 3 (TRIB3), a pseudokinase with pleiotropic cellular functions.CONCLUSIONS: A phenotypic screening platform using DCM iPSC-CMs was established for therapeutic target discovery. A combination of SMKIs ameliorated contractile and metabolic dysfunction in DCM iPSC-CMs mediated via the ATF4-dependent serine biosynthesis pathway. Together, these findings suggest that modulation of serine biosynthesis signalling may represent a novel genotype-agnostic therapeutic strategy for genetic DCM.

    View details for DOI 10.1093/eurheartj/ehac305

    View details for PubMedID 35728000

  • An exercise-inducible metabolite that suppresses feeding and obesity. Nature Li, V. L., He, Y., Contrepois, K., Liu, H., Kim, J. T., Wiggenhorn, A. L., Tanzo, J. T., Tung, A. S., Lyu, X., Zushin, P. H., Jansen, R. S., Michael, B., Loh, K. Y., Yang, A. C., Carl, C. S., Voldstedlund, C. T., Wei, W., Terrell, S. M., Moeller, B. C., Arthur, R. M., Wallis, G. A., van de Wetering, K., Stahl, A., Kiens, B., Richter, E. A., Banik, S. M., Snyder, M. P., Xu, Y., Long, J. Z. 2022

    Abstract

    Exercise confers protection against obesity, type 2 diabetes and other cardiometabolic diseases1-5. However, the molecular and cellular mechanisms that mediate the metabolic benefits of physical activity remain unclear6. Here we show that exercise stimulates the production of N-lactoyl-phenylalanine (Lac-Phe), a blood-borne signalling metabolite that suppresses feeding and obesity. The biosynthesis of Lac-Phe from lactate and phenylalanine occurs in CNDP2+ cells, including macrophages, monocytes and other immune and epithelial cells localized to diverse organs. In diet-induced obese mice, pharmacological-mediated increases in Lac-Phe reduces food intake without affecting movement or energy expenditure. Chronic administration of Lac-Phe decreases adiposity and body weight and improves glucose homeostasis. Conversely, genetic ablation of Lac-Phe biosynthesis in mice increases food intake and obesity following exercise training. Last, large activity-inducible increases in circulating Lac-Phe are also observed in humans and racehorses, establishing this metabolite as a molecular effector associated with physical activity across multiple activity modalities and mammalian species. These data define a conserved exercise-inducible metabolite that controls food intake and influences systemic energy balance.

    View details for DOI 10.1038/s41586-022-04828-5

    View details for PubMedID 35705806

  • Ultra-Low Input High-Fidelity (ULI-HiFi) long-reads uncover variants in genomic dark matter from pre-cancer polyp and tumor samples Lee, H., Erwin, G., Horning, A., Kirtikar, R., Griffin-Baldwin, E., Rowell, W., Li, P., Kingan, S., Snyder, M. P. AMER ASSOC CANCER RESEARCH. 2022
  • Endogenous Retroviral Elements Generate Pathologic Neutrophils in Pulmonary Arterial Hypertension. American journal of respiratory and critical care medicine Taylor, S., Isobe, S., Cao, A., Contrepois, K., Benayoun, B. A., Jiang, L., Wang, L., Melemenidis, S., Ozen, M. O., Otsuki, S., Shinohara, T., Sweatt, A. J., Kaplan, J., Moonen, J., Marciano, D. P., Gu, M., Miyagawa, K., Hayes, B., Sierra, R. G., Kupitz, C. J., Del Rosario, P. A., Hsi, A., Thompson, A. A., Ariza, M. E., Demirci, U., Zamanian, R. T., Haddad, F., Nicolls, M. R., Snyder, M. P., Rabinovitch, M. 2022

    Abstract

    RATIONALE: The role of neutrophils and their extracellular vesicles (EVs) in the pathogenesis of pulmonary arterial hypertension is unclear.OBJECTIVES: Relate functional abnormalities in pulmonary arterial hypertension neutrophils and their EVs to mechanisms uncovered by proteomic and transcriptomic profiling.METHODS: Production of elastase, release of extracellular traps, adhesion and migration were assessed in neutrophils from pulmonary arterial hypertension patients and control subjects. Proteomic analyses were applied to explain functional perturbations, and transcriptomic data were used to find underlying mechanisms. CD66b-specific neutrophil EVs were isolated from plasma of patients with pulmonary arterial hypertension and we determined whether they produce pulmonary hypertension in mice.MEASUREMENTS AND MAIN RESULTS: Neutrophils from pulmonary arterial hypertension patients produce and release increased neutrophil elastase, associated with enhanced extracellular traps. They exhibit reduced migration and increased adhesion attributed to elevated beta1integrin and vinculin identified on proteomic analysis and previously linked to an antiviral response. This was substantiated by a transcriptomic interferon signature that we related to an increase in human endogenous retrovirus k envelope protein. Transfection of human endogenous retrovirus k envelope in a neutrophil cell line (HL-60) increases neutrophil elastase and interferon genes, whereas vinculin is increased by human endogenous retrovirus k dUTPase that is elevated in patient plasma. Neutrophil EVs from patient plasma contain increased neutrophil elastase and human endogenous retrovirus k envelope and induce pulmonary hypertension in mice, mitigated by elafin, an elastase inhibitor.CONCLUSIONS: Elevated human endogenous retroviral elements and elastase link a neutrophil innate immune response to pulmonary arterial hypertension.

    View details for DOI 10.1164/rccm.202102-0446OC

    View details for PubMedID 35696338

  • Wnt Signaling Interactor WTIP (Wilms Tumor Interacting Protein) Underlies Novel Mechanism for Cardiac Hypertrophy. Circulation. Genomic and precision medicine De Jong, H. N., Dewey, F. E., Cordero, P., Victorio, R. A., Kirillova, A., Huang, Y., Madhvani, R., Seo, K., Werdich, A. A., Lan, F., Orcholski, M., Robert Liu, W., Erbilgin, A., Wheeler, M. T., Chen, R., Pan, S., Kim, Y. M., Bommakanti, K., Marcou, C. A., Martijn Bos, J., Haddad, F., Ackerman, M., Vasan, R. S., MacRae, C., Wu, J. C., de Jesus Perez, V., Snyder, M., Parikh, V. N., Ashley, E. A. 2022: 101161CIRCGEN121003563

    Abstract

    BACKGROUND: The study of hypertrophic cardiomyopathy (HCM)-a severe Mendelian disease-can yield insight into the mechanisms underlying the complex trait of cardiac hypertrophy. To date, most genetic variants associated with HCM have been found in sarcomeric genes. Here, we describe a novel HCM-associated variant in the noncanonical Wnt signaling interactor WTIP (Wilms tumor interacting protein) and provide evidence of a role for WTIP in complex disease.METHODS: In a family affected by HCM, we used exome sequencing and identity-by-descent analysis to identify a novel variant in WTIP (p.Y233F). We knocked down WTIP in isolated neonatal rat ventricular myocytes with lentivirally delivered shRNAs and in Danio rerio via morpholino injection. We performed weighted gene coexpression network analysis for WTIP in human cardiac tissue, as well as association analysis for WTIP variation and left ventricular hypertrophy. Finally, we generated induced pluripotent stem cell-derived cardiomyocytes from patient tissue, characterized size and calcium cycling, and determined the effect of verapamil treatment on calcium dynamics.RESULTS: WTIP knockdown caused hypertrophy in neonatal rat ventricular myocytes and increased cardiac hypertrophy, peak calcium, and resting calcium in D rerio. Network analysis of human cardiac tissue indicated WTIP as a central coordinator of prohypertrophic networks, while common variation at the WTIP locus was associated with human left ventricular hypertrophy. Patient-derived WTIP p.Y233F-induced pluripotent stem cell-derived cardiomyocytes recapitulated cellular hypertrophy and increased resting calcium, which was ameliorated by verapamil.CONCLUSIONS: We demonstrate that a novel genetic variant found in a family with HCM disrupts binding to a known Wnt signaling protein, misregulating cardiomyocyte calcium dynamics. Further, in orthogonal model systems, we show that expression of the gene WTIP is important in complex cardiac hypertrophy phenotypes. These findings, derived from the observation of a rare Mendelian disease variant, uncover a novel disease mechanism with implications across diverse forms of cardiac hypertrophy.

    View details for DOI 10.1161/CIRCGEN.121.003563

    View details for PubMedID 35671065

  • A cancer-associated RNA polymerase III identity drives robust transcription and expression of snaR-A noncoding RNA. Nature communications Van Bortle, K., Marciano, D. P., Liu, Q., Chou, T., Lipchik, A. M., Gollapudi, S., Geller, B. S., Monte, E., Kamakaka, R. T., Snyder, M. P. 2022; 13 (1): 3007

    Abstract

    RNA polymerase III (Pol III) includes two alternate isoforms, defined by mutually exclusive incorporation of subunit POLR3G (RPC7alpha) or POLR3GL (RPC7beta), in mammals. The contributions of POLR3G and POLR3GL to transcription potential has remained poorly defined. Here, we discover that loss of subunit POLR3G is accompanied by a restricted repertoire of genes transcribed by Pol III. Particularly sensitive is snaR-A, a small noncoding RNA implicated in cancer proliferation and metastasis. Analysis of Pol III isoform biases and downstream chromatin features identifies loss of POLR3G and snaR-A during differentiation, and conversely, re-establishment of POLR3G gene expression and SNAR-A gene features in cancer contexts. Our results support a model in which Pol III identity functions as an important transcriptional regulatory mechanism. Upregulation of POLR3G, which is driven by MYC, identifies a subgroup of patients with unfavorable survival outcomes in specific cancers, further implicating the POLR3G-enhanced transcription repertoire as a potential disease factor.

    View details for DOI 10.1038/s41467-022-30323-6

    View details for PubMedID 35637192

  • Prediction of gestational age using urinary metabolites in term and preterm pregnancies. Scientific reports Contrepois, K., Chen, S., Ghaemi, M. S., Wong, R. J., Alliance for Maternal and Newborn Health Improvement (AMANHI), Global Alliance to Prevent Prematurity and Stillbirth (GAPPS), Shaw, G., Stevenson, D. K., Aghaeepour, N., Snyder, M. P., Jehan, F., Sazawal, S., Baqui, A. H., Nisar, M. I., Dhingra, U., Khanam, R., Ilyas, M., Dutta, A., Mehmood, U., Deb, S., Hotwani, A., Ali, S. M., Rahman, S., Nizar, A., Ame, S. M., Muhammad, S., Chauhan, A., Khan, W., Raqib, R., Das, S., Ahmed, S., Hasan, T., Khalid, J., Juma, M. H., Chowdhury, N. H., Kabir, F., Aftab, F., Quaiyum, M. A., Manu, A., Yoshida, S., Bahl, R., Rahman, A., Pervin, J., Price, J. T., Rahman, M., Kasaro, M. P., Litch, J. A., Musonda, P., Vwalika, B., Stringer, J. S. 2022; 12 (1): 8033

    Abstract

    Assessment of gestational age (GA) is key to provide optimal care during pregnancy. However, its accurate determination remains challenging in low- and middle-income countries, where access to obstetric ultrasound is limited. Hence, there is an urgent need to develop clinical approaches that allow accurate and inexpensive estimations of GA. We investigated the ability of urinary metabolites to predict GA at time of collection in a diverse multi-site cohort of healthy and pathological pregnancies (n=99) using a broad-spectrum liquid chromatography coupled with mass spectrometry (LC-MS) platform. Our approach detected a myriad of steroid hormones and their derivatives including estrogens, progesterones, corticosteroids, and androgens which were associated with pregnancy progression. We developed a restricted model that predicted GA with high accuracy using three metabolites (rho=0.87, RMSE=1.58weeks) that was validated in an independent cohort (n=20). The predictions were more robust in pregnancies that went to term in comparison to pregnancies that ended prematurely. Overall, we demonstrated the feasibility of implementing urine metabolomics analysis in large-scale multi-site studies and report a predictive model of GA with a potential clinical value.

    View details for DOI 10.1038/s41598-022-11866-6

    View details for PubMedID 35577875

  • Author Correction: Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature ENCODE Project Consortium, Moore, J. E., Purcaro, M. J., Pratt, H. E., Epstein, C. B., Shoresh, N., Adrian, J., Kawli, T., Davis, C. A., Dobin, A., Kaul, R., Halow, J., Van Nostrand, E. L., Freese, P., Gorkin, D. U., Shen, Y., He, Y., Mackiewicz, M., Pauli-Behn, F., Williams, B. A., Mortazavi, A., Keller, C. A., Zhang, X., Elhajjajy, S. I., Huey, J., Dickel, D. E., Snetkova, V., Wei, X., Wang, X., Rivera-Mulia, J. C., Rozowsky, J., Zhang, J., Chhetri, S. B., Zhang, J., Victorsen, A., White, K. P., Visel, A., Yeo, G. W., Burge, C. B., Lecuyer, E., Gilbert, D. M., Dekker, J., Rinn, J., Mendenhall, E. M., Ecker, J. R., Kellis, M., Klein, R. J., Noble, W. S., Kundaje, A., Guigo, R., Farnham, P. J., Cherry, J. M., Myers, R. M., Ren, B., Graveley, B. R., Gerstein, M. B., Pennacchio, L. A., Snyder, M. P., Bernstein, B. E., Wold, B., Hardison, R. C., Gingeras, T. R., Stamatoyannopoulos, J. A., Weng, Z., Abascal, F., Acosta, R., Addleman, N. J., Adrian, J., Afzal, V., Aken, B., Akiyama, J. A., Jammal, O. A., Amrhein, H., Anderson, S. M., Andrews, G. R., Antoshechkin, I., Ardlie, K. G., Armstrong, J., Astley, M., Banerjee, B., Barkal, A. A., Barnes, I. H., Barozzi, I., Barrell, D., Barson, G., Bates, D., Baymuradov, U. K., Bazile, C., Beer, M. A., Beik, S., Bender, M. A., Bennett, R., Bouvrette, L. P., Bernstein, B. E., Berry, A., Bhaskar, A., Bignell, A., Blue, S. M., Bodine, D. M., Boix, C., Boley, N., Borrman, T., Borsari, B., Boyle, A. P., Brandsmeier, L. A., Breschi, A., Bresnick, E. H., Brooks, J. A., Buckley, M., Burge, C. B., Byron, R., Cahill, E., Cai, L., Cao, L., Carty, M., Castanon, R. G., Castillo, A., Chaib, H., Chan, E. T., Chee, D. R., Chee, S., Chen, H., Chen, H., Chen, J., Chen, S., Cherry, J. M., Chhetri, S. B., Choudhary, J. S., Chrast, J., Chung, D., Clarke, D., Cody, N. A., Coppola, C. J., Coursen, J., D'Ippolito, A. M., Dalton, S., Danyko, C., Davidson, C., Davila-Velderrain, J., Davis, C. A., Dekker, J., Deran, A., DeSalvo, G., Despacio-Reyes, G., Dewey, C. N., Dickel, D. E., Diegel, M., Diekhans, M., Dileep, V., Ding, B., Djebali, S., Dobin, A., Dominguez, D., Donaldson, S., Drenkow, J., Dreszer, T. R., Drier, Y., Duff, M. O., Dunn, D., Eastman, C., Ecker, J. R., Edwards, M. D., El-Ali, N., Elhajjajy, S. I., Elkins, K., Emili, A., Epstein, C. B., Evans, R. C., Ezkurdia, I., Fan, K., Farnham, P. J., Farrell, N. P., Feingold, E. A., Ferreira, A., Fisher-Aylor, K., Fitzgerald, S., Flicek, P., Foo, C. S., Fortier, K., Frankish, A., Freese, P., Fu, S., Fu, X., Fu, Y., Fukuda-Yuzawa, Y., Fulciniti, M., Funnell, A. P., Gabdank, I., Galeev, T., Gao, M., Giron, C. G., Garvin, T. H., Gelboin-Burkhart, C. A., Georgolopoulos, G., Gerstein, M. B., Giardine, B. M., Gifford, D. K., Gilbert, D. M., Gilchrist, D. A., Gillespie, S., Gingeras, T. R., Gong, P., Gonzalez, A., Gonzalez, J. M., Good, P., Goren, A., Gorkin, D. U., Graveley, B. R., Gray, M., Greenblatt, J. F., Griffiths, E., Groudine, M. T., Grubert, F., Gu, M., Guigo, R., Guo, H., Guo, Y., Guo, Y., Gursoy, G., Gutierrez-Arcelus, M., Halow, J., Hardison, R. C., Hardy, M., Hariharan, M., Harmanci, A., Harrington, A., Harrow, J. L., Hashimoto, T. B., Hasz, R. D., Hatan, M., Haugen, E., Hayes, J. E., He, P., He, Y., Heidari, N., Hendrickson, D., Heuston, E. F., Hilton, J. A., Hitz, B. C., Hochman, A., Holgren, C., Hou, L., Hou, S., Hsiao, Y. E., Hsu, S., Huang, H., Hubbard, T. J., Huey, J., Hughes, T. R., Hunt, T., Ibarrientos, S., Issner, R., Iwata, M., Izuogu, O., Jaakkola, T., Jameel, N., Jansen, C., Jiang, L., Jiang, P., Johnson, A., Johnson, R., Jungreis, I., Kadaba, M., Kasowski, M., Kasparian, M., Kato, M., Kaul, R., Kawli, T., Kay, M., Keen, J. C., Keles, S., Keller, C. A., Kelley, D., Kellis, M., Kheradpour, P., Kim, D. S., Kirilusha, A., Klein, R. J., Knoechel, B., Kuan, S., Kulik, M. J., Kumar, S., Kundaje, A., Kutyavin, T., Lagarde, J., Lajoie, B. R., Lambert, N. J., Lazar, J., Lee, A. Y., Lee, D., Lee, E., Lee, J. W., Lee, K., Leslie, C. S., Levy, S., Li, B., Li, H., Li, N., Li, X., Li, Y. I., Li, Y., Li, Y., Li, Y., Lian, J., Libbrecht, M. W., Lin, S., Lin, Y., Liu, D., Liu, J., Liu, P., Liu, T., Liu, X. S., Liu, Y., Liu, Y., Long, M., Lou, S., Loveland, J., Lu, A., Lu, Y., Lecuyer, E., Ma, L., Mackiewicz, M., Mannion, B. J., Mannstadt, M., Manthravadi, D., Marinov, G. K., Martin, F. J., Mattei, E., McCue, K., McEown, M., McVicker, G., Meadows, S. K., Meissner, A., Mendenhall, E. M., Messer, C. L., Meuleman, W., Meyer, C., Miller, S., Milton, M. G., Mishra, T., Moore, D. E., Moore, H. M., Moore, J. E., Moore, S. H., Moran, J., Mortazavi, A., Mudge, J. M., Munshi, N., Murad, R., Myers, R. M., Nandakumar, V., Nandi, P., Narasimha, A. M., Narayanan, A. K., Naughton, H., Navarro, F. C., Navas, P., Nazarovs, J., Nelson, J., Neph, S., Neri, F. J., Nery, J. R., Nesmith, A. R., Newberry, J. S., Newberry, K. M., Ngo, V., Nguyen, R., Nguyen, T. B., Nguyen, T., Nishida, A., Noble, W. S., Novak, C. S., Novoa, E. M., Nunez, B., O'Donnell, C. W., Olson, S., Onate, K. C., Otterman, E., Ozadam, H., Pagan, M., Palden, T., Pan, X., Park, Y., Partridge, E. C., Paten, B., Pauli-Behn, F., Pazin, M. J., Pei, B., Pennacchio, L. A., Perez, A. R., Perry, E. H., Pervouchine, D. D., Phalke, N. N., Pham, Q., Phanstiel, D. H., Plajzer-Frick, I., Pratt, G. A., Pratt, H. E., Preissl, S., Pritchard, J. K., Pritykin, Y., Purcaro, M. J., Qin, Q., Quinones-Valdez, G., Rabano, I., Radovani, E., Raj, A., Rajagopal, N., Ram, O., Ramirez, L., Ramirez, R. N., Rausch, D., Raychaudhuri, S., Raymond, J., Razavi, R., Reddy, T. E., Reimonn, T. M., Ren, B., Reymond, A., Reynolds, A., Rhie, S. K., Rinn, J., Rivera, M., Rivera-Mulia, J. C., Roberts, B. S., Rodriguez, J. M., Rozowsky, J., Ryan, R., Rynes, E., Salins, D. N., Sandstrom, R., Sasaki, T., Sathe, S., Savic, D., Scavelli, A., Scheiman, J., Schlaffner, C., Schloss, J. A., Schmitges, F. W., See, L. H., Sethi, A., Setty, M., Shafer, A., Shan, S., Sharon, E., Shen, Q., Shen, Y., Sherwood, R. I., Shi, M., Shin, S., Shoresh, N., Siebenthall, K., Sisu, C., Slifer, T., Sloan, C. A., Smith, A., Snetkova, V., Snyder, M. P., Spacek, D. V., Srinivasan, S., Srivas, R., Stamatoyannopoulos, G., Stamatoyannopoulos, J. A., Stanton, R., Steffan, D., Stehling-Sun, S., Strattan, J. S., Su, A., Sundararaman, B., Suner, M., Syed, T., Szynkarek, M., Tanaka, F. Y., Tenen, D., Teng, M., Thomas, J. A., Toffey, D., Tress, M. L., Trout, D. E., Trynka, G., Tsuji, J., Upchurch, S. A., Ursu, O., Uszczynska-Ratajczak, B., Uziel, M. C., Valencia, A., Biber, B. V., van der Velde, A. G., Van Nostrand, E. L., Vaydylevich, Y., Vazquez, J., Victorsen, A., Vielmetter, J., Vierstra, J., Visel, A., Vlasova, A., Vockley, C. M., Volpi, S., Vong, S., Wang, H., Wang, M., Wang, Q., Wang, R., Wang, T., Wang, W., Wang, X., Wang, Y., Watson, N. K., Wei, X., Wei, Z., Weisser, H., Weissman, S. M., Welch, R., Welikson, R. E., Weng, Z., Westra, H., Whitaker, J. W., White, C., White, K. P., Wildberg, A., Williams, B. A., Wine, D., Witt, H. N., Wold, B., Wolf, M., Wright, J., Xiao, R., Xiao, X., Xu, J., Xu, J., Yan, K., Yan, Y., Yang, H., Yang, X., Yang, Y., Yardimci, G. G., Yee, B. A., Yeo, G. W., Young, T., Yu, T., Yue, F., Zaleski, C., Zang, C., Zeng, H., Zeng, W., Zerbino, D. R., Zhai, J., Zhan, L., Zhan, Y., Zhang, B., Zhang, J., Zhang, J., Zhang, K., Zhang, L., Zhang, P., Zhang, Q., Zhang, X., Zhang, Y., Zhang, Z., Zhao, Y., Zheng, Y., Zhong, G., Zhou, X., Zhu, Y., Zimmerman, J., Ai, R., Li, S. 2022

    View details for DOI 10.1038/s41586-021-04226-3

    View details for PubMedID 35474001

  • Author Correction: Perspectives on ENCODE. Nature ENCODE Project Consortium, Snyder, M. P., Gingeras, T. R., Moore, J. E., Weng, Z., Gerstein, M. B., Ren, B., Hardison, R. C., Stamatoyannopoulos, J. A., Graveley, B. R., Feingold, E. A., Pazin, M. J., Pagan, M., Gilchrist, D. A., Hitz, B. C., Cherry, J. M., Bernstein, B. E., Mendenhall, E. M., Zerbino, D. R., Frankish, A., Flicek, P., Myers, R. M., Abascal, F. B., Acosta, R., Addleman, N. J., Adrian, J., Afzal, V., Aken, B., Ai, R., Akiyama, J. A., Jammal, O. A., Amrhein, H., Anderson, S. M., Andrews, G. R., Antoshechkin, I., Ardlie, K. G., Armstrong, J., Astley, M., Banerjee, B., Barkal, A. A., Barnes, I. H., Barozzi, I., Barrell, D., Barson, G., Bates, D., Baymuradov, U. K., Bazile, C., Beer, M. A., Beik, S., Bender, M. A., Bennett, R., Bouvrette, L. P., Bernstein, B. E., Berry, A., Bhaskar, A., Bignell, A., Blue, S. M., Bodine, D. M., Boix, C., Boley, N., Borrman, T., Borsari, B., Boyle, A. P., Brandsmeier, L. A., Breschi, A., Bresnick, E. H., Brooks, J. A., Buckley, M., Burge, C. B., Byron, R., Cahill, E., Cai, L., Cao, L., Carty, M., Castanon, R. G., Castillo, A., Chaib, H., Chan, E. T., Chee, D. R., Chee, S., Chen, H., Chen, H., Chen, J., Chen, S., Cherry, J. M., Chhetri, S. B., Choudhary, J. S., Chrast, J., Chung, D., Clarke, D., Cody, N. A., Coppola, C. J., Coursen, J., D'Ippolito, A. M., Dalton, S., Danyko, C., Davidson, C., Davila-Velderrain, J., Davis, C. A., Dekker, J., Deran, A., DeSalvo, G., Despacio-Reyes, G., Dewey, C. N., Dickel, D. E., Diegel, M., Diekhans, M., Dileep, V., Ding, B., Djebali, S., Dobin, A., Dominguez, D., Donaldson, S., Drenkow, J., Dreszer, T. R., Drier, Y., Duff, M. O., Dunn, D., Eastman, C., Ecker, J. R., Edwards, M. D., El-Ali, N., Elhajjajy, S. I., Elkins, K., Emili, A., Epstein, C. B., Evans, R. C., Ezkurdia, I., Fan, K., Farnham, P. J., Farrell, N., Feingold, E. A., Ferreira, A., Fisher-Aylor, K., Fitzgerald, S., Flicek, P., Foo, C. S., Fortier, K., Frankish, A., Freese, P., Fu, S., Fu, X., Fu, Y., Fukuda-Yuzawa, Y., Fulciniti, M., Funnell, A. P., Gabdank, I., Galeev, T., Gao, M., Giron, C. G., Garvin, T. H., Gelboin-Burkhart, C. A., Georgolopoulos, G., Gerstein, M. B., Giardine, B. M., Gifford, D. K., Gilbert, D. M., Gilchrist, D. A., Gillespie, S., Gingeras, T. R., Gong, P., Gonzalez, A., Gonzalez, J. M., Good, P., Goren, A., Gorkin, D. U., Graveley, B. R., Gray, M., Greenblatt, J. F., Griffiths, E., Groudine, M. T., Grubert, F., Gu, M., Guigo, R., Guo, H., Guo, Y., Guo, Y., Gursoy, G., Gutierrez-Arcelus, M., Halow, J., Hardison, R. C., Hardy, M., Hariharan, M., Harmanci, A., Harrington, A., Harrow, J. L., Hashimoto, T. B., Hasz, R. D., Hatan, M., Haugen, E., Hayes, J. E., He, P., He, Y., Heidari, N., Hendrickson, D., Heuston, E. F., Hilton, J. A., Hitz, B. C., Hochman, A., Holgren, C., Hou, L., Hou, S., Hsiao, Y. E., Hsu, S., Huang, H., Hubbard, T. J., Huey, J., Hughes, T. R., Hunt, T., Ibarrientos, S., Issner, R., Iwata, M., Izuogu, O., Jaakkola, T., Jameel, N., Jansen, C., Jiang, L., Jiang, P., Johnson, A., Johnson, R., Jungreis, I., Kadaba, M., Kasowski, M., Kasparian, M., Kato, M., Kaul, R., Kawli, T., Kay, M., Keen, J. C., Keles, S., Keller, C. A., Kelley, D., Kellis, M., Kheradpour, P., Kim, D. S., Kirilusha, A., Klein, R. J., Knoechel, B., Kuan, S., Kulik, M. J., Kumar, S., Kundaje, A., Kutyavin, T., Lagarde, J., Lajoie, B. R., Lambert, N. J., Lazar, J., Lee, A. Y., Lee, D., Lee, E., Lee, J. W., Lee, K., Leslie, C. S., Levy, S., Li, B., Li, H., Li, N., Li, S., Li, X., Li, Y. I., Li, Y., Li, Y., Li, Y., Lian, J., Libbrecht, M. W., Lin, S., Lin, Y., Liu, D., Liu, J., Liu, P., Liu, T., Liu, X. S., Liu, Y., Liu, Y., Long, M., Lou, S., Loveland, J., Lu, A., Lu, Y., Lecuyer, E., Ma, L., Mackiewicz, M., Mannion, B. J., Mannstadt, M., Manthravadi, D., Marinov, G. K., Martin, F. J., Mattei, E., McCue, K., McEown, M., McVicker, G., Meadows, S. K., Meissner, A., Mendenhall, E. M., Messer, C. L., Meuleman, W., Meyer, C., Miller, S., Milton, M. G., Mishra, T., Moore, D. E., Moore, H. M., Moore, J. E., Moore, S. H., Moran, J., Mortazavi, A., Mudge, J. M., Munshi, N., Murad, R., Myers, R. M., Nandakumar, V., Nandi, P., Narasimha, A. M., Narayanan, A. K., Naughton, H., Navarro, F. C., Navas, P., Nazarovs, J., Nelson, J., Neph, S., Neri, F. J., Nery, J. R., Nesmith, A. R., Newberry, J. S., Newberry, K. M., Ngo, V., Nguyen, R., Nguyen, T. B., Nguyen, T., Nishida, A., Noble, W. S., Novak, C. S., Novoa, E. M., Nunez, B., O'Donnell, C. W., Olson, S., Onate, K. C., Otterman, E., Ozadam, H., Pagan, M., Palden, T., Pan, X., Park, Y., Partridge, E. C., Paten, B., Pauli-Behn, F., Pazin, M. J., Pei, B., Pennacchio, L. A., Perez, A. R., Perry, E. H., Pervouchine, D. D., Phalke, N. N., Pham, Q., Phanstiel, D. H., Plajzer-Frick, I., Pratt, G. A., Pratt, H. E., Preissl, S., Pritchard, J. K., Pritykin, Y., Purcaro, M. J., Qin, Q., Quinones-Valdez, G., Rabano, I., Radovani, E., Raj, A., Rajagopal, N., Ram, O., Ramirez, L., Ramirez, R. N., Rausch, D., Raychaudhuri, S., Raymond, J., Razavi, R., Reddy, T. E., Reimonn, T. M., Ren, B., Reymond, A., Reynolds, A., Rhie, S. K., Rinn, J., Rivera, M., Rivera-Mulia, J. C., Roberts, B., Rodriguez, J. M., Rozowsky, J., Ryan, R., Rynes, E., Salins, D. N., Sandstrom, R., Sasaki, T., Sathe, S., Savic, D., Scavelli, A., Scheiman, J., Schlaffner, C., Schloss, J. A., Schmitges, F. W., See, L. H., Sethi, A., Setty, M., Shafer, A., Shan, S., Sharon, E., Shen, Q., Shen, Y., Sherwood, R. I., Shi, M., Shin, S., Shoresh, N., Siebenthall, K., Sisu, C., Slifer, T., Sloan, C. A., Smith, A., Snetkova, V., Snyder, M. P., Spacek, D. V., Srinivasan, S., Srivas, R., Stamatoyannopoulos, G., Stamatoyannopoulos, J. A., Stanton, R., Steffan, D., Stehling-Sun, S., Strattan, J. S., Su, A., Sundararaman, B., Suner, M., Syed, T., Szynkarek, M., Tanaka, F. Y., Tenen, D., Teng, M., Thomas, J. A., Toffey, D., Tress, M. L., Trout, D. E., Trynka, G., Tsuji, J., Upchurch, S. A., Ursu, O., Uszczynska-Ratajczak, B., Uziel, M. C., Valencia, A., Biber, B. V., van der Velde, A. G., Van Nostrand, E. L., Vaydylevich, Y., Vazquez, J., Victorsen, A., Vielmetter, J., Vierstra, J., Visel, A., Vlasova, A., Vockley, C. M., Volpi, S., Vong, S., Wang, H., Wang, M., Wang, Q., Wang, R., Wang, T., Wang, W., Wang, X., Wang, Y., Watson, N. K., Wei, X., Wei, Z., Weisser, H., Weissman, S. M., Welch, R., Welikson, R. E., Weng, Z., Westra, H., Whitaker, J. W., White, C., White, K. P., Wildberg, A., Williams, B. A., Wine, D., Witt, H. N., Wold, B., Wolf, M., Wright, J., Xiao, R., Xiao, X., Xu, J., Xu, J., Yan, K., Yan, Y., Yang, H., Yang, X., Yang, Y., Yardimci, G. G., Yee, B. A., Yeo, G. W., Young, T., Yu, T., Yue, F., Zaleski, C., Zang, C., Zeng, H., Zeng, W., Zerbino, D. R., Zhai, J., Zhan, L., Zhan, Y., Zhang, B., Zhang, J., Zhang, J., Zhang, K., Zhang, L., Zhang, P., Zhang, Q., Zhang, X., Zhang, Y., Zhang, Z., Zhao, Y., Zheng, Y., Zhong, G., Zhou, X., Zhu, Y., Zimmerman, J. 2022

    View details for DOI 10.1038/s41586-021-04213-8

    View details for PubMedID 35474002

  • A machine learning algorithm with subclonal sensitivity reveals widespread pan-cancer human leukocyte antigen loss of heterozygosity. Nature communications Pyke, R. M., Mellacheruvu, D., Dea, S., Abbott, C. W., McDaniel, L., Bhave, D. P., Zhang, S. V., Levy, E., Bartha, G., West, J., Snyder, M. P., Chen, R. O., Boyle, S. M. 2022; 13 (1): 1925

    Abstract

    Human leukocyte antigen loss of heterozygosity (HLA LOH) allows cancer cells to escape immune recognition by deleting HLA alleles, causing the suppressed presentation of tumor neoantigens. Despite its importance in immunotherapy response, few methods exist to detect HLA LOH, and their accuracy is not well understood. Here, we develop DASH (Deletion of Allele-Specific HLAs), a machine learning-based algorithm to detect HLA LOH from paired tumor-normal sequencing data. With cell line mixtures, we demonstrate increased sensitivity compared to previously published tools. Moreover, our patient-specific digital PCR validation approach provides a sensitive, robust orthogonal approach that could be used for clinical validation. Using DASH on 610 patients across 15 tumor types, we find that 18% of patients have HLA LOH. Moreover, we show inflated HLA LOH rates compared to genome-wide LOH and correlations between CD274 (encodes PD-L1) expression and microsatellite instability status, suggesting the HLA LOH is a key immune resistance strategy.

    View details for DOI 10.1038/s41467-022-29203-w

    View details for PubMedID 35414054

  • Exploring disease interrelationships in patients with lymphatic disorders: A single center retrospective experience. Clinical and translational medicine Rockson, S. G., Zhou, X., Zhao, L., Hosseini, D. K., Jiang, X., Sweatt, A. J., Kim, D., Tian, W., Snyder, M. P., Nicolls, M. R. 2022; 12 (4): e760

    Abstract

    The lymphatic contribution to the circulation is of paramount importance in regulating fluid homeostasis, immune cell trafficking/activation and lipid metabolism. In comparison to the blood vasculature, the impact of the lymphatics has been underappreciated, both in health and disease, likely due to a less well-delineated anatomy and function. Emerging data suggest that lymphatic dysfunction can be pivotal in the initiation and development of a variety of diseases across broad organ systems. Understanding the clinical associations between lymphatic dysfunction and non-lymphatic morbidity provides valuable evidence for future investigations and may foster the discovery of novel biomarkers and therapies.We retrospectively analysed the electronic medical records of 724 patients referred to the Stanford Center for Lymphatic and Venous Disorders. Patients with an established lymphatic diagnosis were assigned to groups of secondary lymphoedema, lipoedema or primary lymphovascular disease. Individuals found to have no lymphatic disorder were served as the non-lymphatic controls. The prevalence of comorbid conditions was enumerated. Pairwise co-occurrence pattern analyses, validated by Jaccard similarity tests, was utilised to investigate disease-disease interrelationships.Comorbidity analyses underscored the expected relationship between the presence of secondary lymphoedema and those diseases that damage the lymphatics. Cardiovascular conditions were common in all lymphatic subgroups. Additionally, statistically significant alteration of disease-disease interrelationships was noted in all three lymphatic categories when compared to the control population.The presence or absence of a lymphatic disease significantly influences disease interrelationships in the study cohorts. As a physiologic substrate, the lymphatic circulation may be an underappreciated participant in disease pathogenesis. These relationships warrant further, prospective scrutiny and study.

    View details for DOI 10.1002/ctm2.760

    View details for PubMedID 35452183

  • A Method for Intelligent Allocation of Diagnostic Testing by Leveraging Data from Commercial Wearable Devices: A Case Study on COVID-19. Research square Dunn, J., Shandhi, M. H., Cho, P., Roghanizad, A., Singh, K., Wang, W., Enache, O., Stern, A., Sbahi, R., Tatar, B., Fiscus, S., Khoo, Q. X., Kuo, Y., Lu, X., Hsieh, J., Kalodzitsa, A., Bahmani, A., Alavi, A., Ray, U., Snyder, M., Ginsburg, G., Pasquale, D., Woods, C., Shaw, R. 2022

    Abstract

    Mass surveillance testing can help control outbreaks of infectious diseases such as COVID-19. However, diagnostic test shortages are prevalent globally and continue to occur in the US with the onset of new COVID-19 variants, demonstrating an unprecedented need for improving our current methods for mass surveillance testing. By targeting surveillance testing towards individuals who are most likely to be infected and, thus, increasing testing positivity rate (i.e., percent positive in the surveillance group), fewer tests are needed to capture the same number of positive cases. Here, we developed an Intelligent Testing Allocation (ITA) method by leveraging data from the CovIdentify study (6,765 participants) and the MyPHD study (8,580 participants), including smartwatch data from 1,265 individuals of whom 126 tested positive for COVID-19. Our rigorous model and parameter search uncovered the optimal time periods and aggregate metrics for monitoring continuous digital biomarkers to increase the positivity rate of COVID-19 diagnostic testing. We found that resting heart rate features distinguished between COVID-19 positive and negative cases earlier in the course of the infection than steps features, as early as ten and five days prior to the diagnostic test, respectively. We also found that including steps features increased the area under the receiver operating characteristic curve (AUC-ROC) by 7-11% when compared with RHR features alone, while including RHR features improved the AUC of the ITA model's precision-recall curve (AUC-PR) by 38-50% when compared with steps features alone. The best AUC-ROC (0.73 ± 0.14 and 0.77 on the cross-validated training set and independent test set, respectively) and AUC-PR (0.55 ± 0.21 and 0.24) were achieved by using data from a single device type (Fitbit) with high-resolution (minute-level) data. Finally, we show that ITA generates up to a 6.5-fold increase in the positivity rate in the cross-validated training set and up to a 3-fold increase in the positivity rate in the independent test set, including both symptomatic and asymptomatic (up to 27%) individuals. Our findings suggest that, if deployed on a large scale and without needing self-reported symptoms, the ITA method could improve allocation of diagnostic testing resources and reduce the burden of test shortages.

    View details for DOI 10.21203/rs.3.rs-1490524/v1

    View details for PubMedID 35378754

    View details for PubMedCentralID PMC8978951

  • Exerkines in health, resilience and disease. Nature reviews. Endocrinology Chow, L. S., Gerszten, R. E., Taylor, J. M., Pedersen, B. K., van Praag, H., Trappe, S., Febbraio, M. A., Galis, Z. S., Gao, Y., Haus, J. M., Lanza, I. R., Lavie, C. J., Lee, C., Lucia, A., Moro, C., Pandey, A., Robbins, J. M., Stanford, K. I., Thackray, A. E., Villeda, S., Watt, M. J., Xia, A., Zierath, J. R., Goodpaster, B. H., Snyder, M. P. 2022

    Abstract

    The health benefits of exercise are well-recognized and are observed across multiple organ systems. These beneficial effects enhance overall resilience, healthspan and longevity. The molecular mechanisms that underlie the beneficial effects of exercise, however, remain poorly understood. Since the discovery in 2000 that muscle contraction releases IL-6, the number of exercise-associated signalling molecules that have been identified has multiplied. Exerkines are defined as signalling moieties released in response to acute and/or chronic exercise, which exert their effects through endocrine, paracrine and/or autocrine pathways. A multitude of organs, cells and tissues release these factors, including skeletal muscle (myokines), the heart (cardiokines), liver (hepatokines), white adipose tissue (adipokines), brown adipose tissue (baptokines) and neurons (neurokines). Exerkines have potential roles in improving cardiovascular, metabolic, immune and neurological health. As such, exerkines have potential for the treatment of cardiovascular disease, type 2 diabetes mellitus and obesity, and possibly in the facilitation of healthy ageing. This Review summarizes the importance and current state of exerkine research, prevailing challenges and future directions.

    View details for DOI 10.1038/s41574-022-00641-2

    View details for PubMedID 35304603

  • Effects of an immersive psychosocial training program on depression and well-being: A randomized clinical trial. Journal of psychiatric research Ganz, A. B., Rolnik, B., Chakraborty, M., Wilson, J., Tau, C., Sharp, M., Reber, D., Slavich, G. M., Snyder, M. P. 2022; 150: 292-299

    Abstract

    Psychiatry stands to benefit from brief non-pharmacological treatments that effectively reduce depressive symptoms. To address this need, we conducted a single-blind randomized clinical trial assessing how a 6-day immersive psychosocial training program, followed by 10-min daily psychosocial exercises for 30 days, improves depressive symptoms. Forty-five adults were block-randomized by depression score to two arms: (a) the immersive psychosocial training program and 10-min daily exercise group (36 days total; total n=23; depressed at baseline n=14); or (b) a gratitude journaling control group (36 days total; total n=22; depressed at baseline n=13). The self-report PHQ-9 was used to assess depression levels in both groups at three time points: baseline, study week one, and study week six. Depression severity improved over time, with a significantly greater reduction in the psychosocial training program group (-82.7%) vs. the control group (-23%), p=0.02 for baseline vs. week six. The effect size for this reduction in depression symptoms was large for the intervention group (d=-1.3; 95% CI, -2.07, -0.45; p<0.001) and small for the control group (d=-0.3; 95% CI, -0.68, 0.03; p=0.22). Seventy-nine percent (11/14) of depressed participants in the intervention condition were in remission (PHQ-9≤4) by week one and 100% (14/14) were in remission at week six. Secondary measures of anxiety, stress, loneliness, and well-being also improved by 15-80% in the intervention group (vs. 0-34% in the control group), ps<0.05. Overall, this brief, immersive psychosocial training program rapidly and substantially improved depression levels and several related secondary outcomes, suggesting that immersive interventions may be useful for reducing depressive symptoms and enhancing well-being.

    View details for DOI 10.1016/j.jpsychires.2022.02.034

    View details for PubMedID 35429739

  • Low expression of EXOSC2 protects against clinical COVID-19 and impedes SARS-CoV-2 replication. bioRxiv : the preprint server for biology Moll, T., Odon, V., Harvey, C., Collins, M. O., Peden, A., Franklin, J., Graves, E., Marshall, J. N., Souza, C. D., Zhang, S., Azzouz, M., Gordon, D., Krogan, N., Ferraiuolo, L., Snyder, M. P., Shaw, P. J., Rehwinkel, J., Cooper-Knock, J. 2022

    Abstract

    New therapeutic targets are a valuable resource in the struggle to reduce the morbidity and mortality associated with the COVID-19 pandemic, caused by the SARS-CoV-2 virus. Genome-wide association studies (GWAS) have identified risk loci, but some loci are associated with co-morbidities and are not specific to host-virus interactions. Here, we identify and experimentally validate a link between reduced expression of EXOSC2 and reduced SARS-CoV-2 replication. EXOSC2 was one of 332 host proteins examined, all of which interact directly with SARS-CoV-2 proteins; EXOSC2 interacts with Nsp8 which forms part of the viral RNA polymerase. Lung-specific eQTLs were identified from GTEx (v7) for each of the 332 host proteins. Aggregating COVID-19 GWAS statistics for gene-specific eQTLs revealed an association between increased expression of EXOSC2 and higher risk of clinical COVID-19 which survived stringent multiple testing correction. EXOSC2 is a component of the RNA exosome and indeed, LC-MS/MS analysis of protein pulldowns demonstrated an interaction between the SARS-CoV-2 RNA polymerase and the majority of human RNA exosome components. CRISPR/Cas9 introduction of nonsense mutations within EXOSC2 in Calu-3 cells reduced EXOSC2 protein expression, impeded SARS-CoV-2 replication and upregulated oligoadenylate synthase ( OAS) genes, which have been linked to a successful immune response against SARS-CoV-2. Reduced EXOSC2 expression did not reduce cellular viability. OAS gene expression changes occurred independent of infection and in the absence of significant upregulation of other interferon-stimulated genes (ISGs). Targeted depletion or functional inhibition of EXOSC2 may be a safe and effective strategy to protect at-risk individuals against clinical COVID-19.

    View details for DOI 10.1101/2022.03.06.483172

    View details for PubMedID 35291294

    View details for PubMedCentralID PMC8923113

  • MITI minimum information guidelines for highly multiplexed tissue images. Nature methods Schapiro, D., Yapp, C., Sokolov, A., Reynolds, S. M., Chen, Y., Sudar, D., Xie, Y., Muhlich, J., Arias-Camison, R., Arena, S., Taylor, A. J., Nikolov, M., Tyler, M., Lin, J., Burlingame, E. A., Human Tumor Atlas Network, Chang, Y. H., Farhi, S. L., Thorsson, V., Venkatamohan, N., Drewes, J. L., Pe'er, D., Gutman, D. A., Herrmann, M. D., Gehlenborg, N., Bankhead, P., Roland, J. T., Herndon, J. M., Snyder, M. P., Angelo, M., Nolan, G., Swedlow, J. R., Schultz, N., Merrick, D. T., Mazzili, S. A., Cerami, E., Rodig, S. J., Santagata, S., Sorger, P. K., Abravanel, D. L., Achilefu, S., Ademuyiwa, F. O., Adey, A. C., Aft, R., Ahn, K. J., Alikarami, F., Alon, S., Ashenberg, O., Baker, E., Baker, G. J., Bandyopadhyay, S., Bayguinov, P., Beane, J., Becker, W., Bernt, K., Betts, C. B., Bletz, J., Blosser, T., Boire, A., Boland, G. M., Boyden, E. S., Bucher, E., Bueno, R., Cai, Q., Cambuli, F., Campbell, J., Cao, S., Caravan, W., Chaligne, R., Chan, J. M., Chasnoff, S., Chatterjee, D., Chen, A. A., Chen, C., Chen, C., Chen, B., Chen, F., Chen, S., Chheda, M. G., Chin, K., Cho, H., Chun, J., Cisneros, L., Coffey, R. J., Cohen, O., Colditz, G. A., Cole, K. A., Collins, N., Cotter, D., Coussens, L. M., Coy, S., Creason, A. L., Cui, Y., Zhou, D. C., Curtis, C., Davies, S. R., Bruijn, I., Delorey, T. M., Demir, E., Denardo, D., Diep, D., Ding, L., DiPersio, J., Dubinett, S. M., Eberlein, T. J., Eddy, J. A., Esplin, E. D., Factor, R. E., Fatahalian, K., Feiler, H. S., Fernandez, J., Fields, A., Fields, R. C., Fitzpatrick, J. A., Ford, J. M., Franklin, J., Fulton, B., Gaglia, G., Galdieri, L., Ganesh, K., Gao, J., Gaudio, B. L., Getz, G., Gibbs, D. L., Gillanders, W. E., Goecks, J., Goodwin, D., Gray, J. W., Greenleaf, W., Grimm, L. J., Gu, Q., Guerriero, J. L., Guha, T., Guimaraes, A. R., Gutierrez, B., Hacohen, N., Hanson, C. R., Harris, C. R., Hawkins, W. G., Heiser, C. N., Hoffer, J., Hollmann, T. J., Hsieh, J. J., Huang, J., Hunger, S. P., Hwang, E., Iacobuzio-Donahue, C., Iglesia, M. D., Islam, M., Izar, B., Jacobson, C. A., Janes, S., Jayasinghe, R. G., Jeudi, T., Johnson, B. E., Johnson, B. E., Ju, T., Kadara, H., Karnoub, E., Karpova, A., Khan, A., Kibbe, W., Kim, A. H., King, L. M., Kozlowski, E., Krishnamoorthy, P., Krueger, R., Kundaje, A., Ladabaum, U., Laquindanum, R., Lau, C., Lau, K. S., LeBoeuf, N. R., Lee, H., Lenburg, M., Leshchiner, I., Levy, R., Li, Y., Lian, C. G., Liang, W., Lim, K., Lin, Y., Liu, D., Liu, Q., Liu, R., Lo, J., Lo, P., Longabaugh, W. J., Longacre, T., Luckett, K., Ma, C., Maher, C., Maier, A., Makowski, D., Maley, C., Maliga, Z., Manoj, P., Maris, J. M., Markham, N., Marks, J. R., Martinez, D., Mashl, J., Masilionis, I., Massague, J., Mazurowski, M. A., McKinley, E. T., McMichael, J., Meyerson, M., Mills, G. B., Mitri, Z. I., Moorman, A., Mudd, J., Murphy, G. F., Deen, N. N., Navin, N. E., Nawy, T., Ness, R. M., Nevins, S., Nirmal, A. J., Novikov, E., Oh, S. T., Oldridge, D. A., Owzar, K., Pant, S. M., Park, W., Patti, G. J., Paul, K., Pelletier, R., Persson, D., Petty, C., Pfister, H., Polyak, K., Puram, S. V., Qiu, Q., Villalonga, A. Q., Ramirez, M. A., Rashid, R., Reeb, A. N., Reid, M. E., Remsik, J., Riesterer, J. L., Risom, T., Ritch, C. C., Rolong, A., Rudin, C. M., Ryser, M. D., Sato, K., Sears, C. L., Semenov, Y. R., Shen, J., Shoghi, K. I., Shrubsole, M. J., Shyr, Y., Sibley, A. B., Simmons, A. J., Sinha, A., Sivagnanam, S., Song, S., Southar-Smith, A., Spira, A. E., Cyr, J. S., Stefankiewicz, S., Storrs, E. P., Stover, E. H., Strand, S. H., Straub, C., Street, C., Su, T., Surrey, L. F., Suver, C., Tan, K., Terekhanova, N. V., Ternes, L., Thadi, A., Thomas, G., Tibshirani, R., Umeda, S., Uzun, Y., Vallius, T., Van Allen, E. R., Vandekar, S., Vega, P. N., Veis, D. J., Vennam, S., Verma, A., Vigneau, S., Wagle, N., Wahl, R., Walle, T., Wang, L., Warchol, S., Washington, M. K., Watson, C., Weimer, A. K., Wendl, M. C., West, R. B., White, S., Windon, A. L., Wu, H., Wu, C., Wu, Y., Wyczalkowski, M. A., Xu, J., Yao, L., Yu, W., Zhang, K., Zhu, X. 2022; 19 (3): 262-267

    View details for DOI 10.1038/s41592-022-01415-4

    View details for PubMedID 35277708

  • Whole transcriptome profiling of prospective endomyocardial biopsies reveals prognostic and diagnostic signatures of cardiac allograft rejection. The Journal of heart and lung transplantation : the official publication of the International Society for Heart Transplantation Piening, B. D., Dowdell, A. K., Zhang, M., Loza, B., Walls, D., Gao, H., Mohebnasab, M., Li, Y. R., Elftmann, E., Wei, E., Gandla, D., Lad, H., Chaib, H., Sweitzer, N. K., Deng, M., Pereira, A. C., Cadeiras, M., Shaked, A., Snyder, M. P., Keating, B. J. 2022

    Abstract

    BACKGROUND: Heart transplantation provides a significant improvement in survival and quality of life for patients with end-stage heart disease, however many recipients experience different levels of graft rejection that can be associated with significant morbidities and mortality. Current clinical standard-of-care for the evaluation of heart transplant acute rejection (AR) consists of routine endomyocardial biopsy (EMB) followed by visual assessment by histopathology for immune infiltration and cardiomyocyte damage. We assessed whether the sensitivity and/or specificity of this process could be improved upon by adding RNA sequencing (RNA-seq) of EMBs coupled with histopathological interpretation.METHODS: Up to 6 standard-of-care, or for-cause EMBs, were collected from 26 heart transplant recipients from the prospective observational Clinical Trials of Transplantation (CTOT)-03 study, during the first 12-months post-transplant and subjected to RNA-seq (n=125 EMBs total). Differential expression and random-forest-based machine learning were applied to develop signatures for classification and prognostication.RESULTS: Leveraging the unique longitudinal nature of this study, we show that transcriptional hallmarks for significant rejection events occur months before the actual event and are not visible using traditional histopathology. Using this information, we identified a prognostic signature for 0R/1R biopsies that with 90% accuracy can predict whether the next biopsy will be 2R/3R.CONCLUSIONS: RNA-seq-based molecular characterization of EMBs shows significant promise for the early detection of cardiac allograft rejection.

    View details for DOI 10.1016/j.healun.2022.01.1377

    View details for PubMedID 35317953

  • Dual isoform sequencing reveals complex transcriptomic and epitranscriptomic landscapes of a prototype baculovirus. Scientific reports Torma, G., Tombacz, D., Moldovan, N., Fulop, A., Prazsak, I., Csabai, Z., Snyder, M., Boldogkoi, Z. 1800; 12 (1): 1291

    Abstract

    In this study, two long-read sequencing (LRS) techniques, MinION from Oxford Nanopore Technologies and Sequel from the Pacific Biosciences, were used for the transcriptional characterization of a prototype baculovirus, Autographa californica multiple nucleopolyhedrovirus. LRS is able to read full-length RNA molecules, and thereby distinguish between transcript isoforms, mono- and polycistronic RNAs, and overlapping transcripts. Altogether, we detected 875 transcript species, of which 759 were novel and 116 were annotated previously. These RNA molecules include 41 novel putative protein coding transcripts [each containing 5'-truncated in-frame open reading frames (ORFs), 14 monocistronic transcripts, 99 polygenic RNAs, 101 non-coding RNAs, and 504 untranslated region isoforms. This work also identified novel replication origin-associated transcripts, upstream ORFs, cis-regulatory sequences and poly(A) sites. We also detected RNA methylation in 99 viral genes and RNA hyper-editing in the longer 5'-UTR transcript isoform of the canonical ORF 19 transcript.

    View details for DOI 10.1038/s41598-022-05457-8

    View details for PubMedID 35079129

  • Unbiased metabolome screen leads to personalized medicine strategy for amyotrophic lateral sclerosis. Brain communications Boddy, S., Islam, M., Moll, T., Kurz, J., Burrows, D., McGown, A., Bhargava, A., Julian, T. H., Harvey, C., Marshall, J. N., Hall, B. P., Allen, S. P., Kenna, K. P., Sanderson, E., Zhang, S., Ramesh, T., Snyder, M. P., Shaw, P. J., McDermott, C., Cooper-Knock, J. 2022; 4 (2): fcac069

    Abstract

    Amyotrophic lateral sclerosis is a rapidly progressive neurodegenerative disease that affects 1/350 individuals in the United Kingdom. The cause of amyotrophic lateral sclerosis is unknown in the majority of cases. Two-sample Mendelian randomization enables causal inference between an exposure, such as the serum concentration of a specific metabolite, and disease risk. We obtained genome-wide association study summary statistics for serum concentrations of 566 metabolites which were population matched with a genome-wide association study of amyotrophic lateral sclerosis. For each metabolite, we performed Mendelian randomization using an inverse variance weighted estimate for significance testing. After stringent Bonferroni multiple testing correction, our unbiased screen revealed three metabolites that were significantly linked to the risk of amyotrophic lateral sclerosis: Estrone-3-sulphate and bradykinin were protective, which is consistent with literature describing a male preponderance of amyotrophic lateral sclerosis and a preventive effect of angiotensin-converting enzyme inhibitors which inhibit the breakdown of bradykinin. Serum isoleucine was positively associated with amyotrophic lateral sclerosis risk. All three metabolites were supported by robust Mendelian randomization measures and sensitivity analyses; estrone-3-sulphate and isoleucine were confirmed in a validation amyotrophic lateral sclerosis genome-wide association study. Estrone-3-sulphate is metabolized to the more active estradiol by the enzyme 17beta-hydroxysteroid dehydrogenase 1; further, Mendelian randomization demonstrated a protective effect of estradiol and rare variant analysis showed that missense variants within HSD17B1, the gene encoding 17beta-hydroxysteroid dehydrogenase 1, modify risk for amyotrophic lateral sclerosis. Finally, in a zebrafish model of C9ORF72-amyotrophic lateral sclerosis, we present evidence that estradiol is neuroprotective. Isoleucine is metabolized via methylmalonyl-CoA mutase encoded by the gene MMUT in a reaction that consumes vitamin B12. Multivariable Mendelian randomization revealed that the toxic effect of isoleucine is dependent on the depletion of vitamin B12; consistent with this, rare variants which reduce the function of MMUT are protective against amyotrophic lateral sclerosis. We propose that amyotrophic lateral sclerosis patients and family members with high serum isoleucine levels should be offered supplementation with vitamin B12.

    View details for DOI 10.1093/braincomms/fcac069

    View details for PubMedID 35441136

  • Identification of end-stage renal disease metabolic signatures from human perspiration Natural Sciences Shankar, V., Michael, B., Celli, A., Zhou, Z., Ashland, M., Tibshirani, R., Snyder, M., Zare, R. 2022

    View details for DOI 10.1002/ntls.20220048

  • Patient-derived gene and protein expression signatures of NGLY1 deficiency. Journal of biochemistry Rauscher, B., Mueller, W. F., Clauder-Munster, S., Jakob, P., Islam, M. S., Sun, H., Ghidelli-Disse, S., Boesche, M., Bantscheff, M., Pflaumer, H., Collier, P., Haase, B., Chen, S., Hoffman, R., Wang, G., Benes, V., Drewes, G., Snyder, M., Steinmetz, L. M. 2021

    Abstract

    N-Glycanase 1 (NGLY1) deficiency is a rare and complex genetic disorder. Although recent studies have shed light on the molecular underpinnings of NGLY1 deficiency, a systematic characterization of gene and protein expression changes in patient-derived cells has been lacking. Here, we performed RNA-sequencing and mass spectrometry to determine the transcriptomes and proteomes of 66 cell lines representing 4 different cell types derived from 14 NGLY1 deficient patients and 17 controls. Although NGLY1 protein levels were up to 9.5-fold downregulated in patients compared to parents, residual and likely non-functional NGLY1 protein was detectable in all patient-derived lymphoblastoid cell lines. Consistent with the role of NGLY1 as a regulator of the transcription factor Nrf1, we observed a cell type-independent downregulation of proteasomal genes in NGLY1 deficient cells. In contrast, genes involved in ribosome biogenesis and mRNA processing were upregulated in multiple cell types. In addition, we observed cell type-specific effects. For example, genes and proteins involved in glutathione synthesis, such as the glutamate-cysteine ligase subunits GCLC and GCLM, were downregulated specifically in lymphoblastoid cells. We provide a web application that enables access to all results generated in this study at https://apps.embl.de/ngly1browser. This resource will guide future studies of NGLY1 deficiency in directions that are most relevant to patients.

    View details for DOI 10.1093/jb/mvab131

    View details for PubMedID 34878535

  • Tet enzymes are essential for early embryogenesis and completion of embryonic genome activation. EMBO reports Arand, J., Chiang, H. R., Martin, D., Snyder, M. P., Sage, J., Reijo Pera, R. A., Wossidlo, M. 2021: e53968

    Abstract

    Mammalian development begins in transcriptional silence followed by a period of widespread activation of thousands of genes. DNA methylation reprogramming is integral to embryogenesis and linked to Tet enzymes, but their function in early development is not well understood. Here, we generate combined deficiencies of all three Tet enzymes in mouse oocytes using a morpholino-guided knockdown approach and study the impact of acute Tet enzyme deficiencies on preimplantation development. Tet1-3 deficient embryos arrest at the 2-cell stage with the most severe phenotype linked to Tet2. Individual Tet enzymes display non-redundant rolesin the consecutive oxidation of 5-methylcytosine to 5-carboxylcytosine. Gene expression analysis uncovers that Tet enzymes are required for completion of embryonic genome activation (EGA) and fine-tuned expression of transposable elements and chimeric transcripts. Whole-genome bisulfite sequencing reveals minor changes of global DNA methylation in Tet-deficient 2-cell embryos, suggesting an important role of non-catalytic functions of Tet enzymes in early embryogenesis. Our results demonstrate that Tet enzymes are key components of the clock that regulates the timing and extent of EGA in mammalian embryos.

    View details for DOI 10.15252/embr.202153968

    View details for PubMedID 34866320

  • Cross-Laboratory Standardization of Preclinical Lipidomics Using Differential Mobility Spectrometry and Multiple Reaction Monitoring. Analytical chemistry Ghorasaini, M., Mohammed, Y., Adamski, J., Bettcher, L., Bowden, J. A., Cabruja, M., Contrepois, K., Ellenberger, M., Gajera, B., Haid, M., Hornburg, D., Hunter, C., Jones, C. M., Klein, T., Mayboroda, O., Mirzaian, M., Moaddel, R., Ferrucci, L., Lovett, J., Nazir, K., Pearson, M., Ubhi, B. K., Raftery, D., Riols, F., Sayers, R., Sijbrands, E. J., Snyder, M. P., Su, B., Velagapudi, V., Williams, K. J., de Rijke, Y. B., Giera, M. 2021

    Abstract

    Modern biomarker and translational research as well as personalized health care studies rely heavily on powerful omics' technologies, including metabolomics and lipidomics. However, to translate metabolomics and lipidomics discoveries into a high-throughput clinical setting, standardization is of utmost importance. Here, we compared and benchmarked a quantitative lipidomics platform. The employed Lipidyzer platform is based on lipid class separation by means of differential mobility spectrometry with subsequent multiple reaction monitoring. Quantitation is achieved by the use of 54 deuterated internal standards and an automated informatics approach. We investigated the platform performance across nine laboratories using NIST SRM 1950-Metabolites in Frozen Human Plasma, and three NIST Candidate Reference Materials 8231-Frozen Human Plasma Suite for Metabolomics (high triglyceride, diabetic, and African-American plasma). In addition, we comparatively analyzed 59 plasma samples from individuals with familial hypercholesterolemia from a clinical cohort study. We provide evidence that the more practical methyl-tert-butyl ether extraction outperforms the classic Bligh and Dyer approach and compare our results with two previously published ring trials. In summary, we present standardized lipidomics protocols, allowing for the highly reproducible analysis of several hundred human plasma lipids, and present detailed molecular information for potentially disease relevant and ethnicity-related materials.

    View details for DOI 10.1021/acs.analchem.1c02826

    View details for PubMedID 34859676

  • Human exposome assessment platform. Environmental epidemiology (Philadelphia, Pa.) Merino Martinez, R., Muller, H., Negru, S., Ormenisan, A., Arroyo Muhr, L. S., Zhang, X., Trier Moller, F., Clements, M. S., Kozlakidis, Z., Pimenoff, V. N., Wilkowski, B., Boeckhout, M., Ohman, H., Chong, S., Holzinger, A., Lehtinen, M., van Veen, E., Bala, P., Widschwendter, M., Dowling, J., Tornroos, J., Snyder, M. P., Dillner, J. 1800; 5 (6): e182

    Abstract

    The Human Exposome Assessment Platform (HEAP) is a research resource for the integrated and efficient management and analysis of human exposome data. The project will provide the complete workflow for obtaining exposome actionable knowledge from population-based cohorts. HEAP is a state-of-the-science service composed of computational resources from partner institutions, accessed through a software framework that provides the world's fastest Hadoop platform for data warehousing and applied artificial intelligence (AI). The software, will provide a decision support system for researchers and policymakers. All the data managed and processed by HEAP, together with the analysis pipelines, will be available for future research. In addition, the platform enables adding new data and analysis pipelines. HEAP's final product can be deployed in multiple instances to create a network of shareable and reusable knowledge on the impact of exposures on public health.

    View details for DOI 10.1097/EE9.0000000000000182

    View details for PubMedID 34909561

  • Design and Methods of the Validating Injury to the Renal Transplant Using Urinary Signatures (VIRTUUS) Study in Children. Transplantation direct Kumar, J., Contrepois, K., Snyder, M., Grimm, P. C., Moudgil, A., Smith, J. M., Bobrowski, A. E., Verghese, P. S., Hooper, D., Ingulli, E., Lestz, R., Weng, P., Reason, J. L., Blydt-Hansen, T. D., Suthanthiran, M., Keating, B., Amaral, S. 2021; 7 (12): e791

    Abstract

    Lack of noninvasive diagnostic and prognostic biomarkers to reliably detect early allograft injury poses a major hindrance to long-term allograft survival in pediatric kidney transplant recipients.Methods: Validating Injury to the Renal Transplant Using Urinary Signatures Children's Study, a North American multicenter prospective cohort study of pediatric kidney transplant recipients, aims to validate urinary cell mRNA and metabolite profiles that were diagnostic and prognostic of acute cellular rejection (ACR) and BK virus nephropathy (BKVN) in adult kidney transplant recipients in Clinical Trials in Organ Transplantation-4. Specifically, we are investigating: (1) whether a urinary cell mRNA 3-gene signature (18S-normalized CD3epsilon, CXCL10 mRNA, and 18S ribosomal RNA) discriminates biopsies with versus without ACR, (2) whether a combined metabolite profile with the 3-gene signature increases sensitivity and specificity of diagnosis and prognostication of ACR, and (3) whether BKV-VP1 mRNA levels in urinary cells are diagnostic of BKVN and prognostic for allograft failure.Results: To date, 204 subjects are enrolled, with 1405 urine samples, including 144 biopsy-associated samples. Among 424 urine samples processed for mRNA, the median A260:280 ratio (RNA purity) was 1.91, comparable with Clinical Trials in Organ Transplantation-4 (median 1.82). The quality control failure rate was 10%. Preliminary results from urine supernatant showed that our metabolomics platform successfully captured a broad array of metabolites. Clustering of pool samples and overlay of samples from various batches demonstrated platform robustness. No study site effect was noted.Conclusions: Multicenter efforts to ascertain urinary biomarkers in pediatric kidney transplant recipients are feasible with high-quality control. Further study will inform whether these signatures are discriminatory and predictive for rejection and infection.

    View details for DOI 10.1097/TXD.0000000000001244

    View details for PubMedID 34805493

  • Network biology bridges the gaps between quantitative genetics and multi-omics to map complex diseases. Current opinion in chemical biology Wu, S., Chen, D., Snyder, M. P. 2021; 66: 102101

    Abstract

    With advances in high-throughput sequencing technologies, quantitative genetics approaches have provided insights into genetic basis of many complex diseases. Emerging in-depth multi-omics profiling technologies have created exciting opportunities for systematically investigating intricate interaction networks with different layers of biological molecules underlying disease etiology. Herein, we summarized two main categories of biological networks: evidence-based and statistically inferred. These different types of molecular networks complement each other at both bulk and single-cell levels. We also review three main strategies to incorporate quantitative genetics results with multi-omics data by network analysis: (a) network propagation, (b) functional module-based methods, (c) comparative/dynamic networks. These strategies not only aid in elucidating molecular mechanisms of complex diseases but can guide the search for therapeutic targets.

    View details for DOI 10.1016/j.cbpa.2021.102101

    View details for PubMedID 34861483

  • Master lineage transcription factors anchor trans mega transcriptional complexes at highly accessible enhancer sites to promote long-range chromatin clustering and transcription of distal target genes. Nucleic acids research White, S. M., Snyder, M. P., Yi, C. 2021

    Abstract

    The term 'super enhancers' (SE) has been widely used to describe stretches of closely localized enhancers that are occupied collectively by large numbers of transcription factors (TFs) and co-factors, and control the transcription of highly-expressed genes. Through integrated analysis of >600 DNase-seq, ChIP-seq, GRO-seq, STARR-seq, RNA-seq, Hi-C and ChIA-PET data in five human cancer cell lines, we identified a new class of autonomous SEs (aSEs) that are excluded from classic SE calls by the widely used Rank Ordering of Super-Enhancers (ROSE) method. TF footprint analysis revealed that compared to classic SEs and regular enhancers, aSEs are tightly bound by a dense array of master lineage TFs, which serve as anchors to recruit additional TFs and co-factors in trans. In addition, aSEs are preferentially enriched for Cohesins, which likely involve in stabilizing long-distance interactions between aSEs and their distal target genes. Finally, we showed that aSEs can be reliably predicted using a single DNase-seq data or combined with Mediator and/or P300 ChIP-seq. Overall, our study demonstrates that aSEs represent a unique class of functionally important enhancer elements that distally regulate the transcription of highly expressed genes.

    View details for DOI 10.1093/nar/gkab1105

    View details for PubMedID 34850122

  • Spatial mapping of protein composition and tissue organization: a primer for multiplexed antibody-based imaging. Nature methods Hickey, J. W., Neumann, E. K., Radtke, A. J., Camarillo, J. M., Beuschel, R. T., Albanese, A., McDonough, E., Hatler, J., Wiblin, A. E., Fisher, J., Croteau, J., Small, E. C., Sood, A., Caprioli, R. M., Angelo, R. M., Nolan, G. P., Chung, K., Hewitt, S. M., Germain, R. N., Spraggins, J. M., Lundberg, E., Snyder, M. P., Kelleher, N. L., Saka, S. K. 2021

    Abstract

    Tissues and organs are composed of distinct cell types that must operate in concert to perform physiological functions. Efforts to create high-dimensional biomarker catalogs of these cells have been largely based on single-cell sequencing approaches, which lack the spatial context required to understand critical cellular communication and correlated structural organization. To probe in situ biology with sufficient depth, several multiplexed protein imaging methods have been recently developed. Though these technologies differ in strategy and mode of immunolabeling and detection tags, they commonly utilize antibodies directed against protein biomarkers to provide detailed spatial and functional maps of complex tissues. As these promising antibody-based multiplexing approaches become more widely adopted, new frameworks and considerations are critical for training future users, generating molecular tools, validating antibody panels, and harmonizing datasets. In this Perspective, we provide essential resources, key considerations for obtaining robust and reproducible imaging data, and specialized knowledge from domain experts and technology developers.

    View details for DOI 10.1038/s41592-021-01316-y

    View details for PubMedID 34811556

  • A review of Mendelian randomization in amyotrophic lateral sclerosis. Brain : a journal of neurology Julian, T. H., Boddy, S., Islam, M., Kurz, J., Whittaker, K. J., Moll, T., Harvey, C., Zhang, S., Snyder, M. P., McDermott, C., Cooper-Knock, J., Shaw, P. J. 2021

    Abstract

    Amyotrophic lateral sclerosis (ALS) is a relatively common and rapidly progressive neurodegenerative disease which, in the majority of cases, is thought to be determined by a complex gene-environment interaction. Exponential growth in the number of performed genome-wide association studies (GWAS), combined with the advent of Mendelian randomization (MR) is opening significant new opportunities to identify environmental exposures which increase or decrease the risk of ALS. Each of these discoveries has the potential to shape new therapeutic interventions. However, to do so rigorous methodological standards must be applied in the performance of MR. We have performed a review of MR studies performed in ALS to date. We identified 20 MR studies, including evaluation of physical exercise, adiposity, cognitive performance, immune function, blood lipids, sleep behaviours, educational attainment, alcohol consumption, smoking and type 2 diabetes mellitus. We have evaluated each study using gold standard methodology supported by the MR literature and the STROBE-MR checklist. Where discrepancies exist between MR studies, we suggest the underlying reasons. A number of studies conclude that there is a causal link between blood lipids and risk of ALS; replication across different datasets and even different populations adds confidence. For other putative risk factors, such as smoking and immune function, MR studies have provided cause for doubt. We highlight the use of positive control analyses in choosing exposure SNPs to make up the MR instrument, use of SNP clumping to avoid false positive results due to SNPs in linkage, and the importance of multiple testing correction. We discuss the implications of survival bias for study of late age of onset diseases such as ALS, and make recommendations to mitigate this potentially important confounder. For MR to be useful to the ALS field, high methodological standards must be applied to ensure reproducibility. MR is already an impactful tool but poor quality studies will lead to incorrect interpretations by a field which includes non-statisticians, wasted resources and missed opportunities.

    View details for DOI 10.1093/brain/awab420

    View details for PubMedID 34791088

  • In-depth triacylglycerol profiling using MS3 Q-Trap mass spectrometry. Analytica chimica acta Cabruja, M., Priotti, J., Domizi, P., Papsdorf, K., Kroetz, D. L., Brunet, A., Contrepois, K., Snyder, M. P. 2021; 1184: 339023

    Abstract

    Total triacylglycerol (TAG) level is a key clinical marker of metabolic and cardiovascular diseases. However, the roles of individual TAGs have not been thoroughly explored in part due to their extreme structural complexity. We present a targeted mass spectrometry-based method combining multiple reaction monitoring (MRM) and multiple stage mass spectrometry (MS3) for the comprehensive qualitative and semiquantitative profiling of TAGs. This method referred as TriP-MS3 - triacylglycerol profiling using MS3 - screens for more than 6,700 TAG species in a fully automated fashion. TriP-MS3 demonstrated excellent reproducibility (median interday CV0.15) and linearity (median R2=0.978) and detected 285 individual TAG species in human plasma. The semiquantitative accuracy of the method was validated by comparison with a state-of-the-art reverse phase liquid chromatography (RPLC)-MS (R2=0.83), which is the most commonly used approach for TAGs profiling. Finally, we demonstrate the utility and the versatility of the method by characterizing the effects of a fatty acid desaturase inhibitor on TAG profiles invitro and by profiling TAGs in Caenorhabditis elegans.

    View details for DOI 10.1016/j.aca.2021.339023

    View details for PubMedID 34625255

  • COVID-19-Induced New-Onset Diabetes: Trends and Technologies. Diabetes Metwally, A. A., Mehta, P., Johnson, B. S., Nagarjuna, A., Snyder, M. P. 2021

    Abstract

    The coronavirus disease 2019 (COVID-19) global pandemic continues to spread worldwide with approximately 216 million confirmed cases and 4.49 million deaths to date. Intensive efforts are ongoing to combat this disease by suppressing viral transmission, understanding its pathogenesis, developing vaccination strategies, and identifying effective therapeutic targets. Individuals with preexisting diabetes also show higher incidence of COVID-19 illness and poorer prognosis upon infection. Likewise, an increased frequency of diabetes onset and diabetes complications has been reported in patients following COVID-19 diagnosis. COVID-19 may elevate the risk of hyperglycemia and other complications in patients with and without prior diabetes history. It is unclear whether the virus induces type 1 or type 2 diabetes or instead causes a novel atypical form of diabetes. Moreover, it remains unknown if recovering COVID-19 patients exhibit a higher risk of developing new-onset diabetes or its complications going forward. The aim of this review is to summarize what is currently known about the epidemiology and mechanisms of this bidirectional relationship between COVID-19 and diabetes. We highlight major challenges that hinder the study of COVID-19-induced new-onset of diabetes and propose a potential framework for overcoming these obstacles. We also review state-of-the-art wearables and microsampling technologies that can further study diabetes management and progression in new-onset diabetes cases. We conclude by outlining current research initiatives investigating the bidirectional relationship between COVID-19 and diabetes, some with emphasis on wearable technology.

    View details for DOI 10.2337/dbi21-0029

    View details for PubMedID 34686519

  • Altered Cardiac Energetics and Mitochondrial Dysfunction in Hypertrophic Cardiomyopathy. Circulation Ranjbarvaziri, S., Kooiker, K. B., Ellenberger, M., Fajardo, G., Zhao, M., Vander Roest, A. S., Woldeyes, R. A., Koyano, T. T., Fong, R., Ma, N., Tian, L., Traber, G. M., Chan, F., Perrino, J., Reddy, S., Chiu, W., Wu, J. C., Woo, J. Y., Ruppel, K. M., Spudich, J. A., Snyder, M. P., Contrepois, K., Bernstein, D. 2021

    Abstract

    Background: Hypertrophic cardiomyopathy (HCM) is a complex disease partly explained by the effects of individual gene variants on sarcomeric protein biomechanics. At the cellular level, HCM mutations most commonly enhance force production, leading to higher energy demands. Despite significant advances in elucidating sarcomeric structure-function relationships, there is still much to be learned about the mechanisms that link altered cardiac energetics to HCM phenotypes. In this work, we test the hypothesis that changes in cardiac energetics represent a common pathophysiologic pathway in HCM. Methods: We performed a comprehensive multi-omics profile of the molecular (transcripts, metabolites, and complex lipids), ultrastructural, and functional components of HCM energetics using myocardial samples from 27 HCM patients and 13 normal controls (donor hearts). Results: Integrated omics analysis revealed alterations in a wide array of biochemical pathways with major dysregulation in fatty acid metabolism, reduction of acylcarnitines, and accumulation of free fatty acids. HCM hearts showed evidence of global energetic decompensation manifested by a decrease in high energy phosphate metabolites [ATP, ADP, and phosphocreatine (PCr)] and a reduction in mitochondrial genes involved in creatine kinase and ATP synthesis. Accompanying these metabolic derangements, electron microscopy showed an increased fraction of severely damaged mitochondria with reduced cristae density, coinciding with reduced citrate synthase (CS) activity and mitochondrial oxidative respiration. These mitochondrial abnormalities were associated with elevated reactive oxygen species (ROS) and reduced antioxidant defenses. However, despite significant mitochondrial injury, HCM hearts failed to upregulate mitophagic clearance. Conclusions: Overall, our findings suggest that perturbed metabolic signaling and mitochondrial dysfunction are common pathogenic mechanisms in patients with HCM. These results highlight potential new drug targets for attenuation of the clinical disease through improving metabolic function and reducing mitochondrial injury.

    View details for DOI 10.1161/CIRCULATIONAHA.121.053575

    View details for PubMedID 34672721

  • The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation. Nature genetics Kim, D. S., Risca, V. I., Reynolds, D. L., Chappell, J., Rubin, A. J., Jung, N., Donohue, L. K., Lopez-Pajares, V., Kathiria, A., Shi, M., Zhao, Z., Deep, H., Sharmin, M., Rao, D., Lin, S., Chang, H. Y., Snyder, M. P., Greenleaf, W. J., Kundaje, A., Khavari, P. A. 2021

    Abstract

    Transcription factors bind DNA sequence motif vocabularies in cis-regulatory elements (CREs) to modulate chromatin state and gene expression during cell state transitions. A quantitative understanding of how motif lexicons influence dynamic regulatory activity has been elusive due to the combinatorial nature of the cis-regulatory code. To address this, we undertook multiomic data profiling of chromatin and expression dynamics across epidermal differentiation to identify 40,103 dynamic CREs associated with 3,609 dynamically expressed genes, then applied an interpretable deep-learning framework to model the cis-regulatory logic of chromatin accessibility. This analysis framework identified cooperative DNA sequence rules in dynamic CREs regulating synchronous gene modules with diverse roles in skin differentiation. Massively parallel reporter assay analysis validated temporal dynamics and cooperative cis-regulatory logic. Variants linked to human polygenic skin disease were enriched in these time-dependent combinatorial motif rules. This integrative approach shows the combinatorial cis-regulatory lexicon of epidermal differentiation and represents a general framework for deciphering the organizational principles of the cis-regulatory code of dynamic gene regulation.

    View details for DOI 10.1038/s41588-021-00947-3

    View details for PubMedID 34650237

  • Divergent patterns of selection on metabolite levels and gene expression. BMC ecology and evolution Kern, A. F., Yang, G. X., Khosla, N. M., Ang, R. M., Snyder, M. P., Fraser, H. B. 2021; 21 (1): 185

    Abstract

    BACKGROUND: Natural selection can act on multiple genes in the same pathway, leading to polygenic adaptation. For example, adaptive changes were found to down-regulate six genes involved in ergosterol biosynthesis-an essential pathway targeted by many antifungal drugs-in some strains of the yeast Saccharomyces cerevisiae. However, the impact of this polygenic adaptation on metabolite levels was unknown. Here, we performed targeted mass spectrometry to measure the levels of eight metabolites in this pathway in 74 yeast strains from a genetic cross.RESULTS: Through quantitative trait locus (QTL) mapping we identified 19 loci affecting ergosterol pathway metabolite levels, many of which overlap loci that also impact gene expression within the pathway. We then used the recently developed v-test, which identified selection acting upon three metabolite levels within the pathway, none of which were predictable from the gene expression adaptation.CONCLUSIONS: These data showed that effects of selection on metabolite levels were complex and not predictable from gene expression data. This suggests that a deeper understanding of metabolism is necessary before we can understand the impacts of even relatively straightforward gene expression adaptations on metabolic pathways.

    View details for DOI 10.1186/s12862-021-01915-5

    View details for PubMedID 34587900

  • Statins Are Associated With Increased Insulin Resistance and Secretion. Arteriosclerosis, thrombosis, and vascular biology Abbasi, F., Lamendola, C., Harris, C. S., Harris, V., Tsai, M., Tripathi, P., Abbas, F., Reaven, G., Reaven, P., Snyder, M. P., Kim, S. H., Knowles, J. W. 2021: ATVBAHA121316159

    Abstract

    OBJECTIVE: Statin treatment reduces the risk of atherosclerotic cardiovascular disease but is associated with a modest increased risk of type 2 diabetes, especially in those with insulin resistance or prediabetes. Our objective was to determine the physiological mechanism for the increased type 2 diabetes risk. Approach and Results: We conducted an open-label clinical trial of atorvastatin 40 mg daily in adults without known atherosclerotic cardiovascular disease or type 2 diabetes at baseline. The co-primary outcomes were changes at 10 weeks versus baseline in insulin resistance as assessed by steady-state plasma glucose during the insulin suppression test and insulin secretion as assessed by insulin secretion rate area under the curve (ISRAUC) during the graded-glucose infusion test. Secondary outcomes included glucose and insulin, both fasting and during oral glucose tolerance test. Of 75 participants who enrolled, 71 completed the study (median age 61 years, 37% women, 65% non-Hispanic White, median body mass index, 27.8 kg/m2). Atorvastatin reduced LDL (low-density lipoprotein)-cholesterol (median decrease 53%, P<0.001) but did not change body weight. Compared with baseline, atorvastatin increased insulin resistance (steady-state plasma glucose) by a median of 8% (P=0.01) and insulin secretion (ISRAUC) by a median of 9% (P<0.001). There were small increases in oral glucose tolerance test glucoseAUC (median increase, 0.05%; P=0.03) and fasting insulin (median increase, 7%; P=0.01).CONCLUSIONS: In individuals without type 2 diabetes, high-intensity atorvastatin for 10 weeks increases insulin resistance and insulin secretion. Over time, the risk of new-onset diabetes with statin use may increase in individuals who become more insulin resistant but are unable to maintain compensatory increases in insulin secretion.REGISTRATION: URL: https://www.clinicaltrials.gov; Unique identifier: NCT02437084.

    View details for DOI 10.1161/ATVBAHA.121.316159

    View details for PubMedID 34433298

  • Temporal changes in soluble angiotensin-converting enzyme 2 associated with metabolic health, body composition, and proteome dynamics during a weight loss diet intervention: a randomized trial with implications for the COVID-19 pandemic. The American journal of clinical nutrition Cauwenberghs, N., Prunicki, M., Sabovcik, F., Perelman, D., Contrepois, K., Li, X., Snyder, M. P., Nadeau, K. C., Kuznetsova, T., Haddad, F., Gardner, C. D. 2021

    Abstract

    BACKGROUND: Angiotensin-converting enzyme 2 (ACE2) serves protective functions in metabolic, cardiovascular, renal, and pulmonary diseases and is linked to COVID-19 pathology. The correlates of temporal changes in soluble ACE2 (sACE2) remain understudied.OBJECTIVES: We explored the associations of sACE2 with metabolic health and proteome dynamics during a weight loss diet intervention.METHODS: We analyzed 457 healthy individuals (mean±SD age: 39.8±6.6 y) with BMI 28-40kg/m2 in the DIETFITS (Diet Intervention Examining the Factors Interacting with Treatment Success) study. Biochemical markers of metabolic health and 236 proteins were measured by Olink CVDII, CVDIII, and Inflammation I arrays at baseline and at 6 mo during the dietary intervention. We determined clinical and routine biochemical correlates of the diet-induced change in sACE2 (DeltasACE2) using stepwise linear regression. We combined feature selection models and multivariable-adjusted linear regression to identify protein dynamics associated with DeltasACE2.RESULTS: sACE2 decreased on average at 6 mo during the diet intervention. Stronger decline in sACE2 during the diet intervention was independently associated with female sex, lower HOMA-IR and LDL cholesterol at baseline, and a stronger decline in HOMA-IR, triglycerides, HDL cholesterol, and fat mass. Participants with decreasing HOMA-IR (OR: 1.97; 95% CI: 1.28, 3.03) and triglycerides (OR: 2.71; 95% CI: 1.72, 4.26) had significantly higher odds for a decrease in sACE2 during the diet intervention than those without (P≤0.0073). Feature selection models linked DeltasACE2 to changes in alpha-1-microglobulin/bikunin precursor, E-selectin, hydroxyacid oxidase 1, kidney injury molecule 1, tyrosine-protein kinase Mer, placental growth factor, thrombomodulin, and TNF receptor superfamily member 10B. DeltasACE2 remained associated with these protein changes in multivariable-adjusted linear regression.CONCLUSIONS: Decrease in sACE2 during a weight loss diet intervention was associated with improvements in metabolic health, fat mass, and markers of angiotensin peptide metabolism, hepatic and vascular injury, renal function, chronic inflammation, and oxidative stress. Our findings may improve the risk stratification, prevention, and management of cardiometabolic complications.This trial was registered at clinicaltrials.gov as NCT01826591.

    View details for DOI 10.1093/ajcn/nqab243

    View details for PubMedID 34375388

  • Prediction of Immunotherapy Response in Melanoma through Combined Modeling of Neoantigen Burden and Immune-Related Resistance Mechanisms. Clinical cancer research : an official journal of the American Association for Cancer Research Abbott, C. W., Boyle, S. M., Pyke, R. M., McDaniel, L. D., Levy, E., Navarro, F. C., Mellacheruvu, D., Zhang, S. V., Tan, M., Santiago, R., Rusan, Z. M., Milani, P., Bartha, G., Harris, J., McClory, R., Snyder, M. P., Jang, S., Chen, R. 2021; 27 (15): 4265-4276

    Abstract

    PURPOSE: While immune checkpoint blockade (ICB) has become a pillar of cancer treatment, biomarkers that consistently predict patient response remain elusive due to the complex mechanisms driving immune response to tumors. We hypothesized that a multi-dimensional approach modeling both tumor and immune-related molecular mechanisms would better predict ICB response than simpler mutation-focused biomarkers, such as tumor mutational burden (TMB).EXPERIMENTAL DESIGN: Tumors from a cohort of patients with late-stage melanoma (n = 51) were profiled using an immune-enhanced exome and transcriptome platform. We demonstrate increasing predictive power with deeper modeling of neoantigens and immune-related resistance mechanisms to ICB.RESULTS: Our neoantigen burden score, which integrates both exome and transcriptome features, more significantly stratified responders and nonresponders (P = 0.016) than TMB alone (P = 0.049). Extension of this model to include immune-related resistance mechanisms affecting the antigen presentation machinery, such as HLA allele-specific LOH, resulted in a composite neoantigen presentation score (NEOPS) that demonstrated further increased association with therapy response (P = 0.002).CONCLUSIONS: NEOPS proved the statistically strongest biomarker compared with all single-gene biomarkers, expression signatures, and TMB biomarkers evaluated in this cohort. Subsequent confirmation of these findings in an independent cohort of patients (n = 110) suggests that NEOPS is a robust, novel biomarker of ICB response in melanoma.

    View details for DOI 10.1158/1078-0432.CCR-20-4314

    View details for PubMedID 34341053

  • Longitudinal linked-read sequencing reveals ecological and evolutionary responses of a human gut microbiome during antibiotic treatment. Genome research Roodgar, M., Good, B. H., Garud, N. R., Martis, S., Avula, M., Zhou, W., Lancaster, S. M., Lee, H., Babveyh, A., Nesamoney, S., Pollard, K. S., Snyder, M. P. 2021

    Abstract

    Gut microbial communities can respond to antibiotic perturbations by rapidly altering their taxonomic and functional composition. However, little is known about the strain-level processes that drive this collective response. Here, we characterize the gut microbiome of a single individual at high temporal and genetic resolution through a period of health, disease, antibiotic treatment, and recovery. We used deep, linked-read metagenomic sequencing to track the longitudinal trajectories of thousands of single nucleotide variants within 36 species, which allowed us to contrast these genetic dynamics with the ecological fluctuations at the species level. We found that antibiotics can drive rapid shifts in the genetic composition of individual species, often involving incomplete genome-wide sweeps of pre-existing variants. These genetic changes were frequently observed in species without obvious changes in species abundance, emphasizing the importance of monitoring diversity below the species level. We also found that many sweeping variants quickly reverted to their baseline levels once antibiotic treatment had concluded, demonstrating that the ecological resilience of the microbiota can sometimes extend all the way down to the genetic level. Our results provide new insights into the population genetic forces that shape individual microbiomes on therapeutically relevant timescales, with potential implications for personalized health and disease.

    View details for DOI 10.1101/gr.265058.120

    View details for PubMedID 34301627

  • Time-Course Transcriptome Profiling of a Poxvirus Using Long-Read Full-Length Assay. Pathogens (Basel, Switzerland) Tombacz, D., Prazsak, I., Torma, G., Csabai, Z., Balazs, Z., Moldovan, N., Denes, B., Snyder, M., Boldogkoi, Z. 2021; 10 (8)

    Abstract

    Viral transcriptomes that are determined using first- and second-generation sequencing techniques are incomplete. Due to the short read length, these methods are inefficient or fail to distinguish between transcript isoforms, polycistronic RNAs, and transcriptional overlaps and readthroughs. Additionally, these approaches are insensitive for the identification of splice and transcriptional start sites (TSSs) and, in most cases, transcriptional end sites (TESs), especially in transcript isoforms with varying transcript ends, and in multi-spliced transcripts. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases. Although vaccinia virus (VACV) does not produce spliced RNAs, its transcriptome has a high diversity of TSSs and TESs, and a high degree of polycistronism that leads to enormous complexity. We applied single-molecule, real-time, and nanopore-based sequencing methods to investigate the time-lapse transcriptome patterns of VACV gene expression.

    View details for DOI 10.3390/pathogens10080919

    View details for PubMedID 34451383

  • The Exposome in the Era of the Quantified Self. Annual review of biomedical data science Zhang, X., Gao, P., Snyder, M. P. 2021; 4: 255-277

    Abstract

    Human health is regulated by complex interactions among the genome, the microbiome, and the environment. While extensive research has been conducted on the human genome and microbiome, little is known about the human exposome. The exposome comprises the totality of chemical, biological, and physical exposures that individuals encounter over their lifetimes. Traditional environmental and biological monitoring only targets specific substances, whereas exposomic approaches identify and quantify thousands of substances simultaneously using nontargeted high-throughput and high-resolution analyses. The quantified self (QS) aims at enhancing our understanding of human health and disease through self-tracking. QS measurements are critical in exposome research, as external exposures impact an individual's health, behavior, and biology. This review discusses both the achievements and the shortcomings of current research and methodologies on the QS and the exposome and proposes future research directions.

    View details for DOI 10.1146/annurev-biodatasci-012721-122807

    View details for PubMedID 34465170

  • Combined nanopore and single-molecule real-time sequencing survey of human betaherpesvirus 5 transcriptome. Scientific reports Kakuk, B., Tombacz, D., Balazs, Z., Moldovan, N., Csabai, Z., Torma, G., Megyeri, K., Snyder, M., Boldogkoi, Z. 2021; 11 (1): 14487

    Abstract

    Long-read sequencing (LRS), a powerful novel approach, is able to read full-length transcripts and confers a major advantage over the earlier gold standard short-read sequencing in the efficiency of identifying for example polycistronic transcripts and transcript isoforms, including transcript length- and splice variants. In this work, we profile the human cytomegalovirus transcriptome using two third-generation LRS platforms: the Sequel from Pacific BioSciences, and MinION from Oxford Nanopore Technologies. We carried out both cDNA and direct RNA sequencing, and applied the LoRTIA software, developed in our laboratory, for the transcript annotations. This study identified a large number of novel transcript variants, including splice isoforms and transcript start and end site isoforms, as well as putative mRNAs with truncated in-frame ORFs (located within the larger ORFs of the canonical mRNAs), which potentially encode N-terminally truncated polypeptides. Our work also disclosed a highly complex meshwork of transcriptional read-throughs and overlaps.

    View details for DOI 10.1038/s41598-021-93593-y

    View details for PubMedID 34262076

  • Multi-Omic, Longitudinal Profile of Third-Trimester Pregnancies Identifies a Molecular Switch That Predicts the Onset of Labor. Stelzer, I., Ghaemi, M., Han, X., Ando, K., Hedou, J., Feyaerts, D., Peterson, L., Ganio, E., Tsai, A., Tsai, E., Rumer, K., Stanley, N., Fallazadeh, R., Becker, M., Culos, A., Gaudilliere, D., Wong, R., Winn, V., Shaw, G., Snyder, M., Stevenson, D., Contrepois, K., Angst, M., Aghaeepour, N., Gaudilliere, B. SPRINGER HEIDELBERG. 2021: 233A-234A
  • Pan-cancer survey of HLA loss of heterozygosity using a robustly validated NGS-based machine learning algorithm. Pyke, R., Mellacheruvu, D., Abbott, C., Dea, S., Levy, E., Zhang, S. V., Bedi, N., Colevas, A., Bhave, D., Chinnappa, M., Bartha, G., Lyle, J., West, J., Snyder, M., Sunwoo, J., Chen, R., Boyle, S. AMER ASSOC CANCER RESEARCH. 2021
  • Mass spectrometry-based metabolomics: a guide for annotation, quantification and best reporting practices. Nature methods Alseekh, S., Aharoni, A., Brotman, Y., Contrepois, K., D'Auria, J., Ewald, J., C Ewald, J., Fraser, P. D., Giavalisco, P., Hall, R. D., Heinemann, M., Link, H., Luo, J., Neumann, S., Nielsen, J., Perez de Souza, L., Saito, K., Sauer, U., Schroeder, F. C., Schuster, S., Siuzdak, G., Skirycz, A., Sumner, L. W., Snyder, M. P., Tang, H., Tohge, T., Wang, Y., Wen, W., Wu, S., Xu, G., Zamboni, N., Fernie, A. R. 2021; 18 (7): 747-756

    Abstract

    Mass spectrometry-based metabolomics approaches can enable detection and quantification of many thousands of metabolite features simultaneously. However, compound identification and reliable quantification are greatly complicated owing to the chemical complexity and dynamic range of the metabolome. Simultaneous quantification of many metabolites within complex mixtures can additionally be complicated by ion suppression, fragmentation and the presence of isomers. Here we present guidelines covering sample preparation, replication and randomization, quantification, recovery and recombination, ion suppression and peak misidentification, as a means to enable high-quality reporting of liquid chromatography- and gas chromatography-mass spectrometry-based metabolomics-derived data.

    View details for DOI 10.1038/s41592-021-01197-1

    View details for PubMedID 34239102

  • Time-course transcriptome analysis of host cell response to poxvirus infection using a dual long-read sequencing approach. BMC research notes Maroti, Z., Tombacz, D., Prazsak, I., Moldovan, N., Csabai, Z., Torma, G., Balazs, Z., Kalmar, T., Denes, B., Snyder, M., Boldogkoi, Z. 2021; 14 (1): 239

    Abstract

    OBJECTIVE: In this study, we applied two long-read sequencing (LRS) approaches, including single-molecule real-time and nanopore-based sequencing methods to investigate the time-lapse transcriptome patterns of host gene expression as a response to Vaccinia virus infection. Transcriptomes determined using short-read sequencing approaches are incomplete because these platforms are inefficient or fail to distinguish between polycistronic RNAs, transcript isoforms, transcriptional start sites, as well as transcriptional readthroughs and overlaps. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases.RESULTS: In this work, we identified a number of novel transcripts and transcript isoforms of Chlorocebus sabaeus. Additionally, analysis of the most abundant 768 host transcripts revealed a significant overrepresentation of the class of genes in the "regulation of signaling receptor activity" Gene Ontology annotation as a result of viral infection.

    View details for DOI 10.1186/s13104-021-05657-x

    View details for PubMedID 34167576

  • AdaTiSS: A Novel Data-Adaptive Robust Method for Identifying Tissue Specificity Scores. Bioinformatics (Oxford, England) Wang, M., Jiang, L., Snyder, M. P. 2021

    Abstract

    MOTIVATION: Accurately detecting tissue specificity (TS) in genes helps researchers understand tissue functions at the molecular level. The Genotype-Tissue Expression project is one of the publicly available data resources, providing large-scale gene expressions across multiple tissue types. Multiple tissue comparisons and heterogeneous tissue expression make it challenging to accurately identify tissue specific gene expression. How to distinguish the inlier expression from the outlier expression becomes important to build the population level information and further quantify the TS. There still lacks a robust and data-adaptive TS method taking into account heterogeneities of the data.METHODS: We found that the key to identify tissue specific gene expression is to properly define a concept of expression population. In a linear regression problem, we developed a novel data-adaptive robust estimation based on density-power-weight under unknown outlier distribution and non-vanishing outlier proportion. The Gaussian-population mixture model was considered in the setting of identifying TS. We took into account heterogeneities of gene expression and applied the robust data-adaptive procedure to estimate the population parameters. With the well-estimated population parameters, we constructed the AdaTiSS algorithm.RESULTS: Our AdaTiSS profiled TS for each gene and each tissue, which standardized the gene expression in terms of TS. We provided a new robust and powerful tool to the literature of defining tissue specificity.AVAILABILITY: https://github.com/mwgrassgreen/AdaTiSS.

    View details for DOI 10.1093/bioinformatics/btab460

    View details for PubMedID 34146104

  • Precision neoantigen discovery using large-scale immunopeptidomes and composite modeling of MHC peptide presentation. Molecular & cellular proteomics : MCP Pyke, R. M., Mellacheruvu, D., Dea, S., Abbott, C., Zhang, S. V., Phillips, N. A., Harris, J., Bartha, G., Desai, S., McClory, R., West, J., Snyder, M. P., Chen, R., Boyle, S. M. 2021: 100111

    Abstract

    Major histocompatibility complex (MHC)-bound peptides that originate from tumor-specific genetic alterations, known as neoantigens, are an important class of anti-cancer therapeutic targets. Accurately predicting peptide presentation by MHC complexes is a key aspect of discovering therapeutically relevant neoantigens. Technological improvements in mass-spectrometry-based immunopeptidomics and advanced modeling techniques have vastly improved MHC presentation prediction over the past two decades. However, improvement in the sensitivity and specificity of prediction algorithms is needed for clinical applications such as the development of personalized cancer vaccines, the discovery of biomarkers for response to checkpoint blockade and the quantification of autoimmune risk in gene therapies. Toward this end, we generated allele-specific immunopeptidomics data using 25 mono-allelic cell lines and created Systematic HLA Epitope Ranking Pan Algorithm (SHERPA), a pan-allelic MHC-peptide algorithm for predicting MHC-peptide binding and presentation. In contrast to previously published large-scale mono-allelic data, we used an HLA-null K562 parental cell line and a stable transfection of HLA alleles to better emulate native presentation. Our dataset includes five previously unprofiled alleles that expand MHC binding pocket diversity in the training data and extend allelic coverage in under profiled populations. To improve generalizability, SHERPA systematically integrates 128 mono-allelic and 384 multi-allelic samples with publicly available immunoproteomics data and binding assay data. Using this dataset, we developed two features that empirically estimate the propensities of genes and specific regions within gene bodies to engender immunopeptides to represent antigen processing. Using a composite model constructed with gradient boosting decision trees, multi-allelic deconvolution and 2.15 million peptides encompassing 167 alleles, we achieved a 1.44 fold improvement of positive predictive value compared to existing tools when evaluated on independent mono-allelic datasets and a 1.15 fold improvement when evaluating on tumor samples. With a high degree of accuracy, SHERPA has the potential to enable precision neoantigen discovery for future clinical applications.

    View details for DOI 10.1016/j.mcpro.2021.100111

    View details for PubMedID 34126241

  • Non-invasive wearables for remote monitoring of HbA1c and glucose variability: proof of concept. BMJ open diabetes research & care Bent, B., Cho, P. J., Wittmann, A., Thacker, C., Muppidi, S., Snyder, M., Crowley, M. J., Feinglos, M., Dunn, J. P. 2021; 9 (1)

    Abstract

    Diabetes prevalence continues to grow and there remains a significant diagnostic gap in one-third of the US population that has pre-diabetes. Innovative, practical strategies to improve monitoring of glycemic health are desperately needed. In this proof-of-concept study, we explore the relationship between non-invasive wearables and glycemic metrics and demonstrate the feasibility of using non-invasive wearables to estimate glycemic metrics, including hemoglobin A1c (HbA1c) and glucose variability metrics.We recorded over 25 000 measurements from a continuous glucose monitor (CGM) with simultaneous wrist-worn wearable (skin temperature, electrodermal activity, heart rate, and accelerometry sensors) data over 8-10 days in 16 participants with normal glycemic state and pre-diabetes (HbA1c 5.2-6.4). We used data from the wearable to develop machine learning models to predict HbA1c recorded on day 0 and glucose variability calculated from the CGM. We tested the accuracy of the HbA1c model on a retrospective, external validation cohort of 10 additional participants and compared results against CGM-based HbA1c estimation models.A total of 250 days of data from 26 participants were collected. Out of the 27 models of glucose variability metrics that we developed using non-invasive wearables, 11 of the models achieved high accuracy (<10% mean average per cent error, MAPE). Our HbA1c estimation model using non-invasive wearables data achieved MAPE of 5.1% on an external validation cohort. The ranking of wearable sensor's importance in estimating HbA1c was skin temperature (33%), electrodermal activity (28%), accelerometry (25%), and heart rate (14%).This study demonstrates the feasibility of using non-invasive wearables to estimate glucose variability metrics and HbA1c for glycemic monitoring and investigates the relationship between non-invasive wearables and the glycemic metrics of glucose variability and HbA1c. The methods used in this study can be used to inform future studies confirming the results of this proof-of-concept study.

    View details for DOI 10.1136/bmjdrc-2020-002027

    View details for PubMedID 36170350

  • Physical exercise is a risk factor for amyotrophic lateral sclerosis: Convergent evidence from Mendelian randomisation, transcriptomics and risk genotypes. EBioMedicine Julian, T. H., Glascow, N., Barry, A. D., Moll, T., Harvey, C., Klimentidis, Y. C., Newell, M., Zhang, S., Snyder, M. P., Cooper-Knock, J., Shaw, P. J. 2021; 68: 103397

    Abstract

    BACKGROUND: Amyotrophic lateral sclerosis (ALS) is a universally fatal neurodegenerative disease. ALS is determined by gene-environment interactions and improved understanding of these interactions may lead to effective personalised medicine. The role of physical exercise in the development of ALS is currently controversial.METHODS: First, we dissected the exercise-ALS relationship in a series of two-sample Mendelian randomisation (MR) experiments. Next we tested for enrichment of ALS genetic risk within exercise-associated transcriptome changes. Finally, we applied a validated physical activity questionnaire in a small cohort of genetically selected ALS patients.FINDINGS: We present MR evidence supporting a causal relationship between genetic liability to frequent and strenuous leisure-time exercise and ALS using a liberal instrument (multiplicative random effects IVW, p=0.01). Transcriptomic analysis revealed that genes with altered expression in response to acute exercise are enriched with known ALS risk genes (permutation test, p=0.013) including C9ORF72, and with ALS-associated rare variants of uncertain significance. Questionnaire evidence revealed that age of onset is inversely proportional to historical physical activity for C9ORF72-ALS (Cox proportional hazards model, Wald test p=0.007, likelihood ratio test p=0.01, concordance=74%) but not for non-C9ORF72-ALS. Variability in average physical activity was lower in C9ORF72-ALS compared to both non-C9ORF72-ALS (F-test, p=0.002) and neurologically normal controls (F-test, p=0.049) which is consistent with a homogeneous effect of physical activity in all C9ORF72-ALS patients.INTERPRETATION: Our MR approach suggests a positive causal relationship between ALS and physical exercise. Exercise is likely to cause motor neuron injury only in patients with a risk-genotype. Consistent with this we have shown that ALS risk genes are activated in response to exercise. In particular, we propose that G4C2-repeat expansion of C9ORF72 predisposes to exercise-induced ALS.FUNDING: We acknowledge support from the Wellcome Trust (JCK, 216596/Z/19/Z), NIHR (PJS, NF-SI-0617-10077; IS-BRC-1215-20017) and NIH (MPS, CEGS5P50HG00773504,1P50HL083800, 1R01HL101388, 1R01-HL122939, S10OD025212, P30DK116074, and UM1HG009442).

    View details for DOI 10.1016/j.ebiom.2021.103397

    View details for PubMedID 34051439

  • Association of HLA loss of heterozygosity with allele-specific neoantigen expansion in response to immunotherapy. Pyke, R., Abbott, C., Dea, S., Bedi, N., Colevas, A., Levy, E., Zhang, S. V., Snyder, M., Mellacheruvu, D., Sunwoo, J. B., Chen, R., Boyle, S. LIPPINCOTT WILLIAMS & WILKINS. 2021
  • Robust prediction of response to immunotherapy in a mixed cohort of previously treated and immunotherapy-naive melanoma patients. Abbott, C., McDaniel, L., Pyke, R., Levy, E., Navarro, F., Wang, S., McClory, R., Snyder, M., Jang, S., Boyle, S., Chen, R. LIPPINCOTT WILLIAMS & WILKINS. 2021
  • Integrated trajectories of the maternal metabolome, proteome, and immunome predict labor onset. Science translational medicine Stelzer, I. A., Ghaemi, M. S., Han, X., Ando, K., Hedou, J. J., Feyaerts, D., Peterson, L. S., Rumer, K. K., Tsai, E. S., Ganio, E. A., Gaudilliere, D. K., Tsai, A. S., Choisy, B., Gaigne, L. P., Verdonk, F., Jacobsen, D., Gavasso, S., Traber, G. M., Ellenberger, M., Stanley, N., Becker, M., Culos, A., Fallahzadeh, R., Wong, R. J., Darmstadt, G. L., Druzin, M. L., Winn, V. D., Gibbs, R. S., Ling, X. B., Sylvester, K., Carvalho, B., Snyder, M. P., Shaw, G. M., Stevenson, D. K., Contrepois, K., Angst, M. S., Aghaeepour, N., Gaudilliere, B. 2021; 13 (592)

    Abstract

    Estimating the time of delivery is of high clinical importance because pre- and postterm deviations are associated with complications for the mother and her offspring. However, current estimations are inaccurate. As pregnancy progresses toward labor, major transitions occur in fetomaternal immune, metabolic, and endocrine systems that culminate in birth. The comprehensive characterization of maternal biology that precedes labor is key to understanding these physiological transitions and identifying predictive biomarkers of delivery. Here, a longitudinal study was conducted in 63 women who went into labor spontaneously. More than 7000 plasma analytes and peripheral immune cell responses were analyzed using untargeted mass spectrometry, aptamer-based proteomic technology, and single-cell mass cytometry in serial blood samples collected during the last 100 days of pregnancy. The high-dimensional dataset was integrated into a multiomic model that predicted the time to spontaneous labor [R = 0.85, 95% confidence interval (CI) [0.79 to 0.89], P = 1.2 * 10-40, N = 53, training set; R = 0.81, 95% CI [0.61 to 0.91], P = 3.9 * 10-7, N = 10, independent test set]. Coordinated alterations in maternal metabolome, proteome, and immunome marked a molecular shift from pregnancy maintenance to prelabor biology 2 to 4 weeks before delivery. A surge in steroid hormone metabolites and interleukin-1 receptor type 4 that preceded labor coincided with a switch from immune activation to regulation of inflammatory responses. Our study lays the groundwork for developing blood-based methods for predicting the day of labor, anchored in mechanisms shared in preterm and term pregnancies.

    View details for DOI 10.1126/scitranslmed.abd9898

    View details for PubMedID 33952678

  • A genome-wide atlas of co-essential modules assigns function to uncharacterized genes. Nature genetics Wainberg, M., Kamber, R. A., Balsubramani, A., Meyers, R. M., Sinnott-Armstrong, N., Hornburg, D., Jiang, L., Chan, J., Jian, R., Gu, M., Shcherbina, A., Dubreuil, M. M., Spees, K., Meuleman, W., Snyder, M. P., Bassik, M. C., Kundaje, A. 2021

    Abstract

    A central question in the post-genomic era is how genes interact to form biological pathways. Measurements of gene dependency across hundreds of cell lines have been used to cluster genes into 'co-essential' pathways, but this approach has been limited by ubiquitous false positives. In the present study, we develop a statistical method that enables robust identification of gene co-essentiality and yields a genome-wide set of functional modules. This atlas recapitulates diverse pathways and protein complexes, and predicts the functions of 108 uncharacterized genes. Validating top predictions, we show that TMEM189 encodes plasmanylethanolamine desaturase, a key enzyme for plasmalogen synthesis. We also show that C15orf57 encodes a protein that binds the AP2 complex, localizes to clathrin-coated pits and enables efficient transferrin uptake. Finally, we provide an interactive webtool for the community to explore our results, which establish co-essentiality profiling as a powerful resource for biological pathway identification and discovery of new gene functions.

    View details for DOI 10.1038/s41588-021-00840-z

    View details for PubMedID 33859415

  • Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease. Cell de Goede, O. M., Nachun, D. C., Ferraro, N. M., Gloudemans, M. J., Rao, A. S., Smail, C., Eulalio, T. Y., Aguet, F., Ng, B., Xu, J., Barbeira, A. N., Castel, S. E., Kim-Hellmuth, S., Park, Y., Scott, A. J., Strober, B. J., GTEx Consortium, Brown, C. D., Wen, X., Hall, I. M., Battle, A., Lappalainen, T., Im, H. K., Ardlie, K. G., Mostafavi, S., Quertermous, T., Kirkegaard, K., Montgomery, S. B., Anand, S., Gabriel, S., Getz, G. A., Graubert, A., Hadley, K., Handsaker, R. E., Huang, K. H., Li, X., MacArthur, D. G., Meier, S. R., Nedzel, J. L., Nguyen, D. T., Segre, A. V., Todres, E., Balliu, B., Bonazzola, R., Brown, A., Conrad, D. F., Cotter, D. J., Cox, N., Das, S., Dermitzakis, E. T., Einson, J., Engelhardt, B. E., Eskin, E., Flynn, E. D., Fresard, L., Gamazon, E. R., Garrido-Martin, D., Gay, N. R., Guigo, R., Hamel, A. R., He, Y., Hoffman, P. J., Hormozdiari, F., Hou, L., Jo, B., Kasela, S., Kashin, S., Kellis, M., Kwong, A., Li, X., Liang, Y., Mangul, S., Mohammadi, P., Munoz-Aguirre, M., Nobel, A. B., Oliva, M., Park, Y., Parsana, P., Reverter, F., Rouhana, J. M., Sabatti, C., Saha, A., Stephens, M., Stranger, B. E., Teran, N. A., Vinuela, A., Wang, G., Wright, F., Wucher, V., Zou, Y., Ferreira, P. G., Li, G., Mele, M., Yeger-Lotem, E., Bradbury, D., Krubit, T., McLean, J. A., Qi, L., Robinson, K., Roche, N. V., Smith, A. M., Tabor, D. E., Undale, A., Bridge, J., Brigham, L. E., Foster, B. A., Gillard, B. M., Hasz, R., Hunter, M., Johns, C., Johnson, M., Karasik, E., Kopen, G., Leinweber, W. F., McDonald, A., Moser, M. T., Myer, K., Ramsey, K. D., Roe, B., Shad, S., Thomas, J. A., Walters, G., Washington, M., Wheeler, J., Jewell, S. D., Rohrer, D. C., Valley, D. R., Davis, D. A., Mash, D. C., Barcus, M. E., Branton, P. A., Sobin, L., Barker, L. K., Gardiner, H. M., Mosavel, M., Siminoff, L. A., Flicek, P., Haeussler, M., Juettemann, T., Kent, W. J., Lee, C. M., Powell, C. C., Rosenbloom, K. R., Ruffier, M., Sheppard, D., Taylor, K., Trevanion, S. J., Zerbino, D. R., Abell, N. S., Akey, J., Chen, L., Demanelis, K., Doherty, J. A., Feinberg, A. P., Hansen, K. D., Hickey, P. F., Jasmine, F., Jiang, L., Kaul, R., Kibriya, M. G., Li, J. B., Li, Q., Lin, S., Linder, S. E., Pierce, B. L., Rizzardi, L. F., Skol, A. D., Smith, K. S., Snyder, M., Stamatoyannopoulos, J., Tang, H., Wang, M., Carithers, L. J., Guan, P., Koester, S. E., Little, A. R., Moore, H. M., Nierras, C. R., Rao, A. K., Vaught, J. B., Volpi, S. 2021

    Abstract

    Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.

    View details for DOI 10.1016/j.cell.2021.03.050

    View details for PubMedID 33864768

  • iNetModels 2.0: an interactive visualization and database of multi-omics data. Nucleic acids research Arif, M., Zhang, C., Li, X., Gungor, C., Cakmak, B., Arslanturk, M., Tebani, A., Ozcan, B., Subas, O., Zhou, W., Piening, B., Turkez, H., Fagerberg, L., Price, N., Hood, L., Snyder, M., Nielsen, J., Uhlen, M., Mardinoglu, A. 2021

    Abstract

    It is essential to reveal the associations between various omics data for a comprehensive understanding of the altered biological process in human wellness and disease. To date, very few studies have focused on collecting and exhibiting multi-omics associations in a single database. Here, we present iNetModels, an interactive database and visualization platform of Multi-Omics Biological Networks (MOBNs). This platform describes the associations between the clinical chemistry, anthropometric parameters, plasma proteomics, plasma metabolomics, as well as metagenomics for oral and gut microbiome obtained from the same individuals. Moreover, iNetModels includes tissue- and cancer-specific Gene Co-expression Networks (GCNs) for exploring the connections between the specific genes. This platform allows the user to interactively explore a single feature's association with other omics data and customize its particular context (e.g. male/female specific). The users can also register their data for sharing and visualization of the MOBNs and GCNs. Moreover, iNetModels allows users who do not have a bioinformatics background to facilitate human wellness and disease research. iNetModels can be accessed freely at https://inetmodels.com without any limitation.

    View details for DOI 10.1093/nar/gkab254

    View details for PubMedID 33849075

  • Inherited causes of clonal haematopoiesis in 97,691 whole genomes (vol 586 , pg 763, 2020) NATURE Bick, A. G., Weinstock, J. S., Nandakumar, S. K., Fulco, C. P., Bao, E. L., Zekavat, S. M., Szeto, M. D., Liao, X., Leventhal, M. J., Nasser, J., Chang, K., Laurie, C., Burugula, B., Gibson, C. J., Niroula, A., Lin, A. E., Taub, M. A., Aguet, F., Ardlie, K., Mitchell, B. D., Barnes, K. C., Moscati, A., Fornage, M., Redline, S., Psaty, B. M., Silverman, E. K., Weiss, S. T., Palmer, N. D., Vasan, R. S., Burchard, E. G., Kardia, S. R., He, J., Kaplan, R. C., Smith, N. L., Arnett, D. K., Schwartz, D. A., Correa, A., de Andrade, M., Guo, X., Konkle, B. A., Custer, B., Peralta, J. M., Gui, H., Meyers, D. A., McGarvey, S. T., Chen, I., Shoemaker, M., Peyser, P. A., Broome, J. G., Gogarten, S. M., Wang, F., Wong, Q., Montasser, M. E., Daya, M., Kenny, E. E., North, K. E., Launer, L. J., Cade, B. E., Bis, J. C., Cho, M. H., Lasky-Su, J., Bowden, D. W., Cupples, L., Mak, A. Y., Becker, L. C., Smith, J. A., Kelly, T. N., Aslibekyan, S., Heckbert, S. R., Tiwari, H. K., Yang, I. V., Heit, J. A., Lubitz, S. A., Johnsen, J. M., Curran, J. E., Wenzel, S. E., Weeks, D. E., Rao, D. C., Darbar, D., Moon, J., Tracy, R. P., Buth, E. J., Rafaels, N., Loos, R. F., Durda, P., Liu, Y., Hou, L., Lee, J., Kachroo, P., Freedman, B. I., Levy, D., Bielak, L. F., Hixson, J. E., Floyd, J. S., Whitsel, E. A., Ellinor, P. T., Irvin, M. R., Fingerlin, T. E., Raffield, L. M., Armasu, S. M., Wheeler, M. M., Sabino, E. C., Blangero, J., Williams, L., Levy, B. D., Sheu, W., Roden, D. M., Boerwinkle, E., Manson, J. E., Mathias, R. A., Desai, P., Taylor, K. D., Johnson, A. D., Auer, P. L., Kooperberg, C., Laurie, C. C., Blackwell, T. W., Smith, A. V., Zhao, H., Lange, E., Lange, L., Rich, S. S., Rotter, J. I., Wilson, J. G., Scheet, P., Kitzman, J. O., Lander, E. S., Engreitz, J. M., Ebert, B. L., Reiner, A. P., Jaiswal, S., Abecasis, G., Sankaran, V. G., Kathiresan, S., Natarajan, P., NHLBI Trans Omi 2021; 591 (7851): E27
  • ALDH1A3 Coordinates Metabolism with Gene Regulation in Pulmonary Arterial Hypertension. Circulation Li, D., Shao, N., Moonen, J., Zhao, Z., Shi, M., Otsuki, S., Wang, L., Nguyen, T., Yan, E., Marciano, D. P., Contrepois, K., Li, C. G., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2021

    Abstract

    Background: Metabolic alterations provide substrates that influence chromatin structure to regulate gene expression that determines cell function in health and disease. Heightened proliferation of smooth muscle cells (SMC) leading to the formation of a neointima is a feature of pulmonary arterial hypertension (PAH) and systemic vascular disease. Increased glycolysis is linked to the proliferative phenotype of these SMC. Methods: RNA Sequencing was applied to pulmonary arterial (PA) SMC from PAH patients with and without a BMPR2 mutation vs. control PASMC to uncover genes required for their heightened proliferation and glycolytic metabolism. Assessment of differentially expressed genes established metabolism as a major pathway, and the most highly upregulated metabolic gene in PAH PASMC was aldehyde dehydrogenase family 1 member 3 (ALDH1A3), an enzyme previously linked to glycolysis and proliferation in cancer cells and systemic vascular SMC. We determined if these functions are ALDH1A3-dependent in PAH PASMC, and if ALDH1A3 is required for the development of pulmonary hypertension in a transgenic mouse. Nuclear localization of ALDH1A3 in PAH PASMC led us to determine whether and how this enzyme coordinately regulates gene expression and metabolism in PAH PASMC. Results: ALDH1A3 mRNA and protein were increased in PAH vs control PASMC, and ALDH1A3 was required for their highly proliferative and glycolytic properties. Mice with Aldh1a3 deleted in SMC did not develop hypoxia-induced PA muscularization or pulmonary hypertension. Nuclear ALDH1A3 converted acetaldehyde to acetate to produce acetyl-CoA to acetylate H3K27, marking active enhancers. This allowed for chromatin modification at nuclear factor Y (NFY)A binding sites via the acetyltransferase KAT2B and permitted NFY mediated transcription of cell cycle and metabolic genes that is required for ALDH1A3-dependent proliferation and glycolysis. Loss of BMPR2 in PAH SMC with or without a mutation upregulated ALDH1A3, and transcription of NFYA and ALDH1A3 in PAH PASMC was beta-catenin dependent. Conclusions: Our studies have uncovered a metabolic-transcriptional axis explaining how dividing cells use ALDH1A3 to coordinate their energy needs with the epigenetic and transcriptional regulation of genes required for SMC proliferation. They suggest that selectively disrupting the pivotal role of ALDH1A3 in PAH SMC, but not EC, is an important therapeutic consideration.

    View details for DOI 10.1161/CIRCULATIONAHA.120.048845

    View details for PubMedID 33764154

  • Understanding how biologic and social determinants affect disparities in preterm birth and outcomes of preterm infants in the NICU. Seminars in perinatology Stevenson, D. K., Aghaeepour, N., Maric, I., Angst, M. S., Darmstadt, G. L., Druzin, M. L., Gaudilliere, B., Ling, X. B., Moufarrej, M. N., Peterson, L. S., Quake, S. R., Relman, D. A., Snyder, M. P., Sylvester, K. G., Shaw, G. M., Wong, R. J. 2021: 151408

    Abstract

    To understand the disparities in spontaneous preterm birth (sPTB) and/or its outcomes, biologic and social determinants as well as healthcare practice (such as those in neonatal intensive care units) should be considered. They have been largely intractable and remain obscure in most cases, despite a myriad of identified risk factors for and causes of sPTB. We still do not know how they might actually affect and lead to the different outcomes at different gestational ages and if they are independent of NICU practices. Here we describe an integrated approach to study the interplay between the genome and exposome, which may drive biochemistry and physiology, with health disparities.

    View details for DOI 10.1016/j.semperi.2021.151408

    View details for PubMedID 33875265

  • Early Detection of SARS-CoV-2 and other Infections in Solid Organ Transplant Recipients and Household Members using Wearable Devices. Transplant international : official journal of the European Society for Organ Transplantation Keating, B. J., Mukhtar, E. H., Elftmann, E. D., Eweje, F. R., Gao, H., Ibrahim, L. I., Kathawate, R. G., Lee, A. C., Li, E. H., Moore, K. A., Nair, N., Chaluvadi, V., Reason, J., Zanoni, F., Honkala, A. T., Al-Ali, A. K., Alrubaish, F. A., Ahmad Al-Mozaini, M., Al-Muhanna, F. A., Al-Romaih, K., Goldfarb, S. B., Kellogg, R., Kiryluk, K., Kizilbash, S. J., Kohut, T. J., Kumar, J., O'Connor, M. J., Rand, E. B., Redfield, R. R., Rolnik, B., Rossano, J., Sanchez, P. G., Alavi, A., Bahmani, A., Bogu, G. K., Brooks, A. W., Metwally, A. A., Mishra, T., Marks, S. D., Montgomery, R. A., Fishman, J. A., Amaral, S., Jacobson, P. A., Wang, M., Snyder, M. P. 2021

    Abstract

    The increasing global prevalence of SARS-CoV-2 and the resulting COVID-19 disease pandemic pose significant concerns for clinical management of solid organ transplant recipients (SOTR). Wearable devices that can measure physiologic changes in biometrics including heart rate, heart rate variability, body temperature, respiratory, activity (such as steps taken per day) and sleep patterns and blood oxygen saturation, show utility for the early detection of infection before clinical presentation of symptoms. Recent algorithms developed using preliminary wearable datasets show that SARS-CoV-2 is detectable before clinical symptoms in >80% of adults. Early detection of SARS-CoV-2, influenza, and other pathogens in SOTR, and their household members, could facilitate early interventions such as self-isolation and early clinical management of relevant infection(s). Ongoing studies testing the utility of wearable devices such as smartwatches for early detection of SARS-CoV-2 and other infections in the general population are reviewed here, along with the practical challenges to implementing these processes at scale in pediatric and adult SOTR, and their household members. The resources and logistics, including transplant specific analyses pipelines to account for confounders such as polypharmacy and comorbidities, required in studies of pediatric and adult SOTR for the robust early detection of SARS-CoV-2 and other infections are also reviewed.

    View details for DOI 10.1111/tri.13860

    View details for PubMedID 33735480

  • Hummingbird: Efficient Performance Prediction for Executing Genomic Applications in the Cloud. Bioinformatics (Oxford, England) Bahmani, A., Xing, Z., Krishnan, V., Ray, U., Mueller, F., Alavi, A., Tsao, P. S., Snyder, M. P., Pan, C. 2021

    Abstract

    MOTIVATION: A major drawback of executing genomic applications on cloud computing facilities is the lack of tools to predict which instance type is the most appropriate, often resulting in an over- or under- matching of resources. Determining the right configuration before actually running the applications will save money and time. Here, we introduce Hummingbird, a tool for predicting performance of computing instances with varying memory and CPU on multiple cloud platforms.RESULTS: Our experiments on three major genomic data pipelines, including GATK HaplotypeCaller, GATK MuTect2, and ENCODE ATAC-seq, showed that Hummingbird was able to address applications in command line specified in JSON format or workflow description language (WDL) format, and accurately predicted the fastest, the cheapest, and the most cost-efficient compute instances in an economic manner.AVAILABILITY: Hummingbird is available as an open source tool at: https://github.com/StanfordBioinformatics/Hummingbird.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btab161

    View details for PubMedID 33693476

  • Response to Hulman and colleagues regarding "Glucotypes reveal new patterns of glucose dysregulation". PLoS biology Breschi, A., Perelman, D., Snyder, M. P. 2021; 19 (3): e3001092

    Abstract

    In a response to a Formal Comment critiquing their model for classifying individualized glucose patterns into glucotypes, these authors stand by their results and conclusions, which can be reproduced using their publicly available data, and maintain that improved algorithms for analyzing CGM data will continue to emerge and enrich the field.

    View details for DOI 10.1371/journal.pbio.3001092

    View details for PubMedID 33705379

  • Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays. BMC bioinformatics Krishnan, V., Utiramerur, S., Ng, Z., Datta, S., Snyder, M. P., Ashley, E. A. 2021; 22 (1): 85

    Abstract

    BACKGROUND: Benchmarking the performance of complex analytical pipelines is an essential part of developing Lab Developed Tests (LDT). Reference samples and benchmark calls published by Genome in a Bottle (GIAB) consortium have enabled the evaluation of analytical methods. The performance of such methods is not uniform across the different genomic regions of interest and variant types. Several benchmarking methods such as hap.py, vcfeval, and vcflib are available to assess the analytical performance characteristics of variant calling algorithms. However, assessing the performance characteristics of an overall LDT assay still requires stringing together several such methods and experienced bioinformaticians to interpret the results. In addition, these methods are dependent on the hardware, operating system and other software libraries, making it impossible to reliably repeat the analytical assessment, when any of the underlying dependencies change in the assay. Here we present a scalable and reproducible, cloud-based benchmarking workflow that is independent of the laboratory and the technician executing the workflow, or the underlying compute hardware used to rapidly and continually assess the performance of LDT assays, across their regions of interest and reportable range, using a broad set of benchmarking samples.RESULTS: The benchmarking workflow was used to evaluate the performance characteristics for secondary analysis pipelines commonly used by Clinical Genomics laboratories in their LDT assays such as the GATK HaplotypeCaller v3.7 and the SpeedSeq workflow based on FreeBayes v0.9.10. Five reference sample truth sets generated by Genome in a Bottle (GIAB) consortium, six samples from the Personal Genome Project (PGP) and several samples with validated clinically relevant variants from the Centers for Disease Control were used in this work. The performance characteristics were evaluated and compared for multiple reportable ranges, such as whole exome and the clinical exome.CONCLUSIONS: We have implemented a benchmarking workflow for clinical diagnostic laboratories that generates metrics such as specificity, precision and sensitivity for germline SNPs and InDels within a reportable range using whole exome or genome sequencing data. Combining these benchmarking results with validation using known variants of clinical significance in publicly available cell lines, we were able to establish the performance of variant calling pipelines in a clinical setting.

    View details for DOI 10.1186/s12859-020-03934-3

    View details for PubMedID 33627090

  • An Integrated Sequencing Approach for Updating the Pseudorabies Virus Transcriptome. Pathogens (Basel, Switzerland) Torma, G., Tombacz, D., Csabai, Z., Gobhardter, D., Deim, Z., Snyder, M., Boldogkoi, Z. 2021; 10 (2)

    Abstract

    In the last couple of years, the implementation of long-read sequencing (LRS) technologies for transcriptome profiling has uncovered an extreme complexity of viral gene expression. In this study, we carried out a systematic analysis on the pseudorabies virus transcriptome by combining our current data obtained by using Pacific Biosciences Sequel and Oxford Nanopore Technologies MinION sequencing with our earlier data generated by other LRS and short-read sequencing techniques. As a result, we identified a number of novel genes, transcripts, and transcript isoforms, including splice and length variants, and also confirmed earlier annotated RNA molecules. One of the major findings of this study is the discovery of a large number of 5'-truncations of larger putative mRNAs being 3'-co-terminal with canonical mRNAs of PRV. A large fraction of these putative RNAs contain in-frame ATGs, which might initiate translation of N-terminally truncated polypeptides. Our analyses indicate that CTO-S, a replication origin-associated RNA molecule is expressed at an extremely high level. This study demonstrates that the PRV transcriptome is much more complex than previously appreciated.

    View details for DOI 10.3390/pathogens10020242

    View details for PubMedID 33672563

  • Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature Taliun, D., Harris, D. N., Kessler, M. D., Carlson, J., Szpiech, Z. A., Torres, R., Taliun, S. A., Corvelo, A., Gogarten, S. M., Kang, H. M., Pitsillides, A. N., LeFaive, J., Lee, S., Tian, X., Browning, B. L., Das, S., Emde, A., Clarke, W. E., Loesch, D. P., Shetty, A. C., Blackwell, T. W., Smith, A. V., Wong, Q., Liu, X., Conomos, M. P., Bobo, D. M., Aguet, F., Albert, C., Alonso, A., Ardlie, K. G., Arking, D. E., Aslibekyan, S., Auer, P. L., Barnard, J., Barr, R. G., Barwick, L., Becker, L. C., Beer, R. L., Benjamin, E. J., Bielak, L. F., Blangero, J., Boehnke, M., Bowden, D. W., Brody, J. A., Burchard, E. G., Cade, B. E., Casella, J. F., Chalazan, B., Chasman, D. I., Chen, Y. I., Cho, M. H., Choi, S. H., Chung, M. K., Clish, C. B., Correa, A., Curran, J. E., Custer, B., Darbar, D., Daya, M., de Andrade, M., DeMeo, D. L., Dutcher, S. K., Ellinor, P. T., Emery, L. S., Eng, C., Fatkin, D., Fingerlin, T., Forer, L., Fornage, M., Franceschini, N., Fuchsberger, C., Fullerton, S. M., Germer, S., Gladwin, M. T., Gottlieb, D. J., Guo, X., Hall, M. E., He, J., Heard-Costa, N. L., Heckbert, S. R., Irvin, M. R., Johnsen, J. M., Johnson, A. D., Kaplan, R., Kardia, S. L., Kelly, T., Kelly, S., Kenny, E. E., Kiel, D. P., Klemmer, R., Konkle, B. A., Kooperberg, C., Kottgen, A., Lange, L. A., Lasky-Su, J., Levy, D., Lin, X., Lin, K., Liu, C., Loos, R. J., Garman, L., Gerszten, R., Lubitz, S. A., Lunetta, K. L., Mak, A. C., Manichaikul, A., Manning, A. K., Mathias, R. A., McManus, D. D., McGarvey, S. T., Meigs, J. B., Meyers, D. A., Mikulla, J. L., Minear, M. A., Mitchell, B. D., Mohanty, S., Montasser, M. E., Montgomery, C., Morrison, A. C., Murabito, J. M., Natale, A., Natarajan, P., Nelson, S. C., North, K. E., O'Connell, J. R., Palmer, N. D., Pankratz, N., Peloso, G. M., Peyser, P. A., Pleiness, J., Post, W. S., Psaty, B. M., Rao, D. C., Redline, S., Reiner, A. P., Roden, D., Rotter, J. I., Ruczinski, I., Sarnowski, C., Schoenherr, S., Schwartz, D. A., Seo, J., Seshadri, S., Sheehan, V. A., Sheu, W. H., Shoemaker, M. B., Smith, N. L., Smith, J. A., Sotoodehnia, N., Stilp, A. M., Tang, W., Taylor, K. D., Telen, M., Thornton, T. A., Tracy, R. P., Van Den Berg, D. J., Vasan, R. S., Viaud-Martinez, K. A., Vrieze, S., Weeks, D. E., Weir, B. S., Weiss, S. T., Weng, L., Willer, C. J., Zhang, Y., Zhao, X., Arnett, D. K., Ashley-Koch, A. E., Barnes, K. C., Boerwinkle, E., Gabriel, S., Gibbs, R., Rice, K. M., Rich, S. S., Silverman, E. K., Qasba, P., Gan, W., NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, Papanicolaou, G. J., Nickerson, D. A., Browning, S. R., Zody, M. C., Zollner, S., Wilson, J. G., Cupples, L. A., Laurie, C. C., Jaquish, C. E., Hernandez, R. D., O'Connor, T. D., Abecasis, G. R., Abe, N., Almasy, L., Ament, S., Anderson, P., Anugu, P., Applebaum-Bowden, D., Assimes, T., Avramopoulos, D., Barron-Casella, E., Beaty, T., Beck, G., Becker, D., Beitelshees, A., Benos, T., Bezerra, M., Bis, J., Bowler, R., Broeckel, U., Broome, J., Bunting, K., Bustamante, C., Buth, E., Cardwell, J., Carey, V., Carty, C., Casaburi, R., Castaldi, P., Chaffin, M., Chang, C., Chang, Y., Chavan, S., Chen, B., Chen, W., Chuang, L., Chung, R., Comhair, S., Cornell, E., Crandall, C., Crapo, J., Curtis, J., Damcott, C., David, S., Davis, C., Fuentes, L. d., DeBaun, M., Deka, R., Devine, S., Duan, Q., Duggirala, R., Durda, J. P., Eaton, C., Ekunwe, L., El Boueiz, A., Erzurum, S., Farber, C., Flickinger, M., Fornage, M., Frazar, C., Fu, M., Fulton, L., Gao, S., Gao, Y., Gass, M., Gelb, B., Geng, X. P., Geraci, M., Ghosh, A., Gignoux, C., Glahn, D., Gong, D., Goring, H., Graw, S., Grine, D., Gu, C. C., Guan, Y., Gupta, N., Haessler, J., Hawley, N. L., Heavner, B., Herrington, D., Hersh, C., Hidalgo, B., Hixson, J., Hobbs, B., Hokanson, J., Hong, E., Hoth, K., Hsiung, C. A., Hung, Y., Huston, H., Hwu, C. M., Jackson, R., Jain, D., Jhun, M. A., Johnson, C., Johnston, R., Jones, K., Kathiresan, S., Khan, A., Kim, W., Kinney, G., Kramer, H., Lange, C., Lange, E., Lange, L., Laurie, C., LeBoff, M., Lee, J., Lee, S. S., Lee, W., Levine, D., Lewis, J., Li, X., Li, Y., Lin, H., Lin, H., Lin, K. H., Liu, S., Liu, Y., Liu, Y., Luo, J., Mahaney, M., Make, B., Manson, J., Margolin, L., Martin, L., Mathai, S., May, S., McArdle, P., McDonald, M., McFarland, S., McGoldrick, D., McHugh, C., Mei, H., Mestroni, L., Min, N., Minster, R. L., Moll, M., Moscati, A., Musani, S., Mwasongwe, S., Mychaleckyj, J. C., Nadkarni, G., Naik, R., Naseri, T., Nekhai, S., Neltner, B., Ochs-Balcom, H., Paik, D., Pankow, J., Parsa, A., Peralta, J. M., Perez, M., Perry, J., Peters, U., Phillips, L. S., Pollin, T., Becker, J. P., Boorgula, M. P., Preuss, M., Qiao, D., Qin, Z., Rafaels, N., Raffield, L., Rasmussen-Torvik, L., Ratan, A., Reed, R., Regan, E., Reupena, M. S., Roselli, C., Russell, P., Ruuska, S., Ryan, K., Sabino, E. C., Saleheen, D., Salimi, S., Salzberg, S., Sandow, K., Sankaran, V. G., Scheller, C., Schmidt, E., Schwander, K., Sciurba, F., Seidman, C., Seidman, J., Sherman, S. L., Shetty, A., Sheu, W. H., Silver, B., Smith, J., Smith, T., Smoller, S., Snively, B., Snyder, M., Sofer, T., Storm, G., Streeten, E., Sung, Y. J., Sylvia, J., Szpiro, A., Sztalryd, C., Tang, H., Taub, M., Taylor, M., Taylor, S., Threlkeld, M., Tinker, L., Tirschwell, D., Tishkoff, S., Tiwari, H., Tong, C., Tsai, M., Vaidya, D., VandeHaar, P., Walker, T., Wallace, R., Walts, A., Wang, F. F., Wang, H., Watson, K., Wessel, J., Williams, K., Williams, L. K., Wilson, C., Wu, J., Xu, H., Yanek, L., Yang, I., Yang, R., Zaghloul, N., Zekavat, M., Zhao, S. X., Zhao, W., Zhi, D., Zhou, X., Zhu, X. 2021; 590 (7845): 290–99

    Abstract

    The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

    View details for DOI 10.1038/s41586-021-03205-y

    View details for PubMedID 33568819

  • Decoding personal biotic and abiotic airborne exposome. Nature protocols Jiang, C., Zhang, X., Gao, P., Chen, Q., Snyder, M. 2021

    Abstract

    The complexity and dynamics of human diseases are driven by the interactions between internal molecular activities and external environmental exposures. Although advances in omics technology have dramatically broadened the understanding of internal molecular and cellular mechanisms, understanding of the external environmental exposures, especially at the personal level, is still rudimentary in comparison. This is largely owing to our limited ability to efficiently collect the personal environmental exposome (PEE) and extract the nucleic acids and chemicals from PEE. Here we describe a protocol that integrates hardware and experimental pipelines to collect and decode biotic and abiotic external exposome at the individual level. The described protocol has several advantages over conventional approaches, such as exposome monitoring at the personal level, decontamination steps to increase sensitivity and simultaneous capture and high-throughput profiling of biotic and abiotic exposures. The protocol takes ~18 h of bench time over 2-3 d to prepare samples for high-throughput profiling and up to a couple of weeks of instrumental time to analyze, depending on the number of samples. Hundreds to thousands of species and organic compounds could be detected in the airborne particulate samples using this protocol. The composition and complexity of the biotic and abiotic substances are heavily influenced by the sampling spatiotemporal factors. Basic skillsets in molecular biology and analytical chemistry are required to carry out this protocol. This protocol could be modified to decode biotic and abiotic substances in other types of low or ultra-low input samples.

    View details for DOI 10.1038/s41596-020-00451-8

    View details for PubMedID 33437065

  • The gut microbiome: a key player in the complexity of amyotrophic lateral sclerosis (ALS). BMC medicine Boddy, S. L., Giovannelli, I. n., Sassani, M. n., Cooper-Knock, J. n., Snyder, M. P., Segal, E. n., Elinav, E. n., Barker, L. A., Shaw, P. J., McDermott, C. J. 2021; 19 (1): 13

    Abstract

    Much progress has been made in mapping genetic abnormalities linked to amyotrophic lateral sclerosis (ALS), but the majority of cases still present with no known underlying cause. Furthermore, even in families with a shared genetic abnormality there is significant phenotypic variability, suggesting that non-genetic elements may modify pathogenesis. Identification of such disease-modifiers is important as they might represent new therapeutic targets. A growing body of research has begun to shed light on the role played by the gut microbiome in health and disease with a number of studies linking abnormalities to ALS.The microbiome refers to the genes belonging to the myriad different microorganisms that live within and upon us, collectively known as the microbiota. Most of these microbes are found in the intestines, where they play important roles in digestion and the generation of key metabolites including neurotransmitters. The gut microbiota is an important aspect of the environment in which our bodies operate and inter-individual differences may be key to explaining the different disease outcomes seen in ALS. Work has begun to investigate animal models of the disease, and the gut microbiomes of people living with ALS, revealing changes in the microbial communities of these groups. The current body of knowledge will be summarised in this review. Advances in microbiome sequencing methods will be highlighted, as their improved resolution now enables researchers to further explore differences at a functional level. Proposed mechanisms connecting the gut microbiome to neurodegeneration will also be considered, including direct effects via metabolites released into the host circulation and indirect effects on bioavailability of nutrients and even medications.Profiling of the gut microbiome has the potential to add an environmental component to rapidly advancing studies of ALS genetics and move research a step further towards personalised medicine for this disease. Moreover, should compelling evidence of upstream neurotoxicity or neuroprotection initiated by gut microbiota emerge, modification of the microbiome will represent a potential new avenue for disease modifying therapies. For an intractable condition with few current therapeutic options, further research into the ALS microbiome is of crucial importance.

    View details for DOI 10.1186/s12916-020-01885-3

    View details for PubMedID 33468103

  • The Exposome in the Era of the Quantified Self ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, VOL 4 Zhang, X., Gao, P., Snyder, M. P., Altman, R. B. 2021; 4: 255-277
  • Structured elements drive extensive circular RNA translation. Molecular cell Chen, C. K., Cheng, R., Demeter, J., Chen, J., Weingarten-Gabbay, S., Jiang, L., Snyder, M. P., Weissman, J. S., Segal, E., Jackson, P. K., Chang, H. Y. 2021

    Abstract

    The human genome encodes tens of thousands circular RNAs (circRNAs) with mostly unknown functions. Circular RNAs require internal ribosome entry sites (IRES) if they are to undergo translation without a 5' cap. Here, we develop a high-throughput screen to systematically discover RNA sequences that can direct circRNA translation in human cells. We identify more than 17,000 endogenous and synthetic sequences as candidate circRNA IRES. 18S rRNA complementarity and a structured RNA element positioned on the IRES are important for driving circRNA translation. Ribosome profiling and peptidomic analyses show extensive IRES-ribosome association, hundreds of circRNA-encoded proteins with tissue-specific distribution, and antigen presentation. We find that circFGFR1p, a protein encoded by circFGFR1 that is downregulated in cancer, functions as a negative regulator of FGFR1 oncoprotein to suppress cell growth during stress. Systematic identification of circRNA IRES elements may provide important links among circRNA regulation, biological function, and disease.

    View details for DOI 10.1016/j.molcel.2021.07.042

    View details for PubMedID 34437836

  • Non-invasive wearables for remote monitoring of HbA1c and glucose variability: proof of concept BMJ OPEN DIABETES RESEARCH & CARE Bent, B., Cho, P. J., Wittmann, A., Thacker, C., Muppidi, S., Snyder, M., Crowley, M. J., Feinglos, M., Dunn, J. P. 2021; 9 (1)
  • The X chromosome from telomere to telomere: key achievements and future opportunities. Faculty reviews Heard, E., Johnson, A. D., Korbel, J. O., Lee, C., Snyder, M. P., Sturgill, D. 1800; 10: 63

    Abstract

    While the human genome represents the most accurate vertebrate reference assembly to date, it still contains numerous gaps, including centromeric and other large repeat-containing regions - often termed the "dark side" of the genome - many of which are of fundamental biological importance. Miga et al.1,2 present the first gapless assembly of the human X chromosome, with the help of ultra-long-read nanopore reads generated for the haploid complete hydatidiform mole (CHM13) genome. They reconstruct the ~3.1 megabase centromeric satellite DNA array and map DNA methylation patterns across complex tandem repeats and satellite arrays. This Telomere-to-Telomere assembly provides a superior human X chromosome reference enabling future sex-determination and X-linked disease research, and provides a path towards finishing the entire human genome sequence.

    View details for DOI 10.12703/r-01-000001

    View details for PubMedID 35088059

  • AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions. Statistical applications in genetics and molecular biology Wang, M., Jiang, L., Snyder, M. P. 2021

    Abstract

    The Genotype-Tissue Expression (GTEx) project provides a valuable resource of large-scale gene expressions across multiple tissue types. Under various technical noise and unknown or unmeasured factors, how to robustly estimate the major tissue effect becomes challenging. Moreover, different genes exhibit heterogeneous expressions across different tissue types. Therefore, we need a robust method which adapts to the heterogeneities of gene expressions to improve the estimation for the tissue effect. We followed the approach of the robust estimation based on γ-density-power-weight in the works of Fujisawa, H. and Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. J. Multivariate Anal. 99: 2053-2081 and Windham, M.P. (1995). Robustifying model fitting. J. Roy. Stat. Soc. B: 599-609, where γ is the exponent of density weight which controls the balance between bias and variance. As far as we know, our work is the first to propose a procedure to tune the parameter γ to balance the bias-variance trade-off under the mixture models. We constructed a robust likelihood criterion based on weighted densities in the mixture model of Gaussian population distribution mixed with unknown outlier distribution, and developed a data-adaptive γ-selection procedure embedded into the robust estimation. We provided a heuristic analysis on the selection criterion and found that our practical selection trend under various γ's in average performance has similar capability to capture minimizer γ as the inestimable mean squared error (MSE) trend from our simulation studies under a series of settings. Our data-adaptive robustifying procedure in the linear regression problem (AdaReg) showed a significant advantage in both simulation studies and real data application in estimating tissue effect of heart samples from the GTEx project, compared to the fixed γ procedure and other robust methods. At the end, the paper discussed some limitations on this method and future work.

    View details for DOI 10.1515/sagmb-2020-0042

    View details for PubMedID 34252998

  • Exposome-wide Association Study for Metabolic Syndrome. Frontiers in genetics Gao, P., Snyder, M. 1800; 12: 783930

    View details for DOI 10.3389/fgene.2021.783930

    View details for PubMedID 34950191

  • Adapting skills from genetic counseling to wearables technology research during the COVID-19 pandemic: Poised for the pivot. Journal of genetic counseling Higgs, E., Dagan-Rosenfeld, O., Snyder, M. 2021

    Abstract

    Genetic counselors have shown themselves to be adaptable in an evolving profession, with expansion into new sub-specialties, various non-clinical settings, and research roles. The COVID-19 pandemic caused a sudden and drastic shift in healthcare priorities. In an effort to contribute meaningfully to the COVID-19 crisis, and to adapt to a remote- and essential-only research environment, our workplace and thus our roles pivoted from genomics research to remote COVID-19 research using wearables technologies. With a deep understanding of genomic data, we were quickly able to apply similar concepts to wearables data including considering privacy implications, managing uncertain findings, and acknowledging the lack of ethnic diversity in many datasets. By sharing our own experience as an example, we hope individuals trained in genetic counseling may see opportunities for adaptation of their skills into expanding roles.

    View details for DOI 10.1002/jgc4.1509

    View details for PubMedID 34580951

  • Real-time Alerting System for COVID-19 Using Wearable Data. medRxiv : the preprint server for health sciences Alavi, A., Bogu, G. K., Wang, M., Rangan, E. S., Brooks, A. W., Wang, Q., Higgs, E., Celli, A., Mishra, T., Metwally, A. A., Cha, K., Knowles, P., Alavi, A. A., Bhasin, R., Panchamukhi, S., Celis, D., Aditya, T., Honkala, A., Rolnik, B., Hunting, E., Dagan-Rosenfeld, O., Chauhan, A., Li, J. W., Li, X., Bahmani, A., Snyder, M. P. 2021

    Abstract

    Early detection of infectious disease is crucial for reducing transmission and facilitating early intervention. We built a real-time smartwatch-based alerting system for the detection of aberrant physiological and activity signals (e.g. resting heart rate, steps) associated with early infection onset at the individual level. Upon applying this system to a cohort of 3,246 participants, we found that alerts were generated for pre-symptomatic and asymptomatic COVID-19 infections in 78% of cases, and pre-symptomatic signals were observed a median of three days prior to symptom onset. Furthermore, by examining over 100,000 survey annotations, we found that other respiratory infections as well as events not associated with COVID-19 (e.g. stress, alcohol consumption, travel) could trigger alerts, albeit at a lower mean period (1.9 days) than those observed in the COVID-19 cases (4.3 days). Thus this system has potential both for advanced warning of COVID-19 as well as a general system for measuring health via detection of physiological shifts from personal baselines. The system is open-source and scalable to millions of users, offering a personal health monitoring system that can operate in real time on a global scale.

    View details for DOI 10.1101/2021.06.13.21258795

    View details for PubMedID 34189532

    View details for PubMedCentralID PMC8240687

  • A DMS Shotgun Lipidomics Workflow Application to Facilitate High-Throughput, Comprehensive Lipidomics. Journal of the American Society for Mass Spectrometry Su, B., Bettcher, L. F., Hsieh, W. Y., Hornburg, D., Pearson, M. J., Blomberg, N., Giera, M., Snyder, M. P., Raftery, D., Bensinger, S. J., Williams, K. J. 2021

    Abstract

    Differential mobility spectrometry (DMS) is highly useful for shotgun lipidomic analysis because it overcomes difficulties in measuring isobaric species within a complex lipid sample and allows for acyl tail characterization of phospholipid species. Despite these advantages, the resulting workflow presents technical challenges, including the need to tune the DMS before every batch to update compensative voltages settings within the method. The Sciex Lipidyzer platform uses a Sciex 5500 QTRAP with a DMS (SelexION), an LC system configured for direction infusion experiments, an extensive set of standards designed for quantitative lipidomics, and a software package (Lipidyzer Workflow Manager) that facilitates the workflow and rapidly analyzes the data. Although the Lipidyzer platform remains very useful for DMS-based shotgun lipidomics, the software is no longer updated for current versions of Analyst and Windows. Furthermore, the software is fixed to a single workflow and cannot take advantage of new lipidomics standards or analyze additional lipid species. To address this multitude of issues, we developed Shotgun Lipidomics Assistant (SLA), a Python-based application that facilitates DMS-based lipidomics workflows. SLA provides the user with flexibility in adding and subtracting lipid and standard MRMs. It can report quantitative lipidomics results from raw data in minutes, comparable to the Lipidyzer software. We show that SLA facilitates an expanded lipidomics analysis that measures over 1450 lipid species across 17 (sub)classes. Lastly, we demonstrate that the SLA performs isotope correction, a feature that was absent from the original software.

    View details for DOI 10.1021/jasms.1c00203

    View details for PubMedID 34637296

  • Swarm: A federated cloud framework for large-scale variant analysis. PLoS computational biology Bahmani, A. n., Ferriter, K. n., Krishnan, V. n., Alavi, A. n., Alavi, A. n., Tsao, P. S., Snyder, M. P., Pan, C. n. 2021; 17 (5): e1008977

    Abstract

    Genomic data analysis across multiple cloud platforms is an ongoing challenge, especially when large amounts of data are involved. Here, we present Swarm, a framework for federated computation that promotes minimal data motion and facilitates crosstalk between genomic datasets stored on various cloud platforms. We demonstrate its utility via common inquiries of genomic variants across BigQuery in the Google Cloud Platform (GCP), Athena in the Amazon Web Services (AWS), Apache Presto and MySQL. Compared to single-cloud platforms, the Swarm framework significantly reduced computational costs, run-time delays and risks of security breach and privacy violation.

    View details for DOI 10.1371/journal.pcbi.1008977

    View details for PubMedID 33979321

  • Precision medicine in women with epilepsy: The challenge, systematic review, and future direction. Epilepsy & behavior : E&B Li, Y. n., Zhang, S. n., Snyder, M. P., Meador, K. J. 2021; 118: 107928

    Abstract

    Epilepsy is one of the most prevalent neurologic conditions, affecting almost 70 million people worldwide. In the United States, 1.3 million women with epilepsy (WWE) are in their active reproductive years. Women with epilepsy (WWE) face gender-specific challenges such as pregnancy, seizure exacerbation with hormonal pattern fluctuations, contraception, fertility, and menopause. Precision medicine, which applies state-of-the art molecular profiling to diagnostic, prognostic, and therapeutic problems, has the potential to advance the care of WWE by precisely tailoring individualized management to each patient's needs. For example, antiseizure medications (ASMs) are among the most common teratogens prescribed to women of childbearing potential. Teratogens act in a dose-dependent manner on a susceptible genotype. However, the genotypes at risk for ASM-induced teratogenic deficits are unknown. Here we summarize current challenging issues for WWE, review the state-of-art tools for clinical precision medicine approaches, perform a systematic review of pharmacogenomic approaches in management for WWE, and discuss potential future directions in this field. We envision a future in which precision medicine enables a new practice style that puts focus on early detection, prediction, and targeted therapies for WWE.

    View details for DOI 10.1016/j.yebeh.2021.107928

    View details for PubMedID 33774354

  • CTLA-4 expression by B-1a B cells is essential for immune tolerance. Nature communications Yang, Y. n., Li, X. n., Ma, Z. n., Wang, C. n., Yang, Q. n., Byrne-Steele, M. n., Hong, R. n., Min, Q. n., Zhou, G. n., Cheng, Y. n., Qin, G. n., Youngyunpipatkul, J. V., Wing, J. B., Sakaguchi, S. n., Toonstra, C. n., Wang, L. X., Vilches-Moure, J. G., Wang, D. n., Snyder, M. P., Wang, J. Y., Han, J. n., Herzenberg, L. A. 2021; 12 (1): 525

    Abstract

    CTLA-4 is an important regulator of T-cell function. Here, we report that expression of this immune-regulator in mouse B-1a cells has a critical function in maintaining self-tolerance by regulating these early-developing B cells that express a repertoire enriched for auto-reactivity. Selective deletion of CTLA-4 from B cells results in mice that spontaneously develop autoantibodies, T follicular helper (Tfh) cells and germinal centers (GCs) in the spleen, and autoimmune pathology later in life. This impaired immune homeostasis results from B-1a cell dysfunction upon loss of CTLA-4. Therefore, CTLA-4-deficient B-1a cells up-regulate epigenetic and transcriptional activation programs and show increased self-replenishment. These activated cells further internalize surface IgM, differentiate into antigen-presenting cells and, when reconstituted in normal IgH-allotype congenic recipient mice, induce GCs and Tfh cells expressing a highly selected repertoire. These findings show that CTLA-4 regulation of B-1a cells is a crucial immune-regulatory mechanism.

    View details for DOI 10.1038/s41467-020-20874-x

    View details for PubMedID 33483505

  • Cell-free DNA (cfDNA) and Exosome Profiling from a Year-Long Human Spaceflight Reveals Circulating Biomarkers. iScience Bezdan, D., Grigorev, K., Meydan, C., Pelissier Vatter, F. A., Cioffi, M., Rao, V., MacKay, M., Nakahira, K., Burnham, P., Afshinnekoo, E., Westover, C., Butler, D., Mozsary, C., Donahoe, T., Foox, J., Mishra, T., Lucotti, S., Rana, B. K., Melnick, A. M., Zhang, H., Matei, I., Kelsen, D., Yu, K., Lyden, D. C., Taylor, L., Bailey, S. M., Snyder, M. P., Garrett-Bakelman, F. E., Ossowski, S., De Vlaminck, I., Mason, C. E. 2020; 23 (12): 101844

    Abstract

    Liquid biopsies based on cell-free DNA (cfDNA) or exosomes provide a noninvasive approach to monitor human health and disease but have not been utilized for astronauts. Here, we profile cfDNA characteristics, including fragment size, cellular deconvolution, and nucleosome positioning, in an astronaut during a year-long mission on the International Space Station, compared to his identical twin on Earth and healthy donors. We observed a significant increase in the proportion of cell-free mitochondrial DNA (cf-mtDNA) inflight, and analysis of post-flight exosomes in plasma revealed a 30-fold increase in circulating exosomes and patient-specific protein cargo (including brain-derived peptides) after the year-long mission. This longitudinal analysis of astronaut cfDNA during spaceflight and the exosome profiles highlights their utility for astronaut health monitoring, as well as cf-mtDNA levels as a potential biomarker for physiological stress or immune system responses related to microgravity, radiation exposure, and the other unique environmental conditions of spaceflight.

    View details for DOI 10.1016/j.isci.2020.101844

    View details for PubMedID 33376973

  • Rare Variant Burden Analysis within Enhancers Identifies CAV1 as an ALS Risk Gene. Cell reports Cooper-Knock, J., Zhang, S., Kenna, K. P., Moll, T., Franklin, J. P., Allen, S., Nezhad, H. G., Iacoangeli, A., Yacovzada, N. Y., Eitan, C., Hornstein, E., Ehilak, E., Celadova, P., Bose, D., Farhan, S., Fishilevich, S., Lancet, D., Morrison, K. E., Shaw, C. E., Al-Chalabi, A., Project MinE ALS Sequencing Consortium, Veldink, J. H., Kirby, J., Snyder, M. P., Shaw, P. J., Blair, I., Wray, N., Kiernan, M., Neto, M. M., Chio, A., Cauchi, R., Robberecht, W., van Damme, P., Corcia, P., Couratier, P., Hardiman, O., McLaughlin, R., Gotkine, M., Drory, V., Ticozzi, N., Silani, V., Veldink, J., van den Berg, L., de Carvalho, M., Pardina, J. M., Povedano, M., Andersen, P., Wber, M., Basak, N., Al-Chalabi, A., Shaw, C., Shaw, P., Morrison, K., Landers, J., Glass, J. 2020; 33 (9): 108456

    Abstract

    Amyotrophic lateral sclerosis (ALS) is an incurable neurodegenerative disease. CAV1 and CAV2 organize membrane lipid rafts (MLRs) important for cell signaling and neuronal survival, and overexpression of CAV1 ameliorates ALS phenotypes invivo. Genome-wide association studies localize a large proportion of ALS risk variants within the non-coding genome, but further characterization has been limited by lack ofappropriate tools. By designing and applying a pipeline to identify pathogenic genetic variation within enhancer elements responsible for regulating gene expression, we identify disease-associated variation within CAV1/CAV2 enhancers, which replicate in an independent cohort. Discovered enhancer mutations reduce CAV1/CAV2 expression and disrupt MLRs in patient-derived cells, and CRISPR-Cas9 perturbation proximate to a patient mutation is sufficient to reduce CAV1/CAV2 expression in neurons. Additional enrichment of ALS-associated mutations within CAV1 exons positions CAV1 as an ALS risk gene. We propose CAV1/CAV2 overexpression as a personalized medicine target for ALS.

    View details for DOI 10.1016/j.celrep.2020.108456

    View details for PubMedID 33264630

  • A Customizable Analysis Flow in Integrative Multi-Omics. Biomolecules Lancaster, S. M., Sanghi, A., Wu, S., Snyder, M. P. 2020; 10 (12)

    Abstract

    The number of researchers using multi-omics is growing. Though still expensive, every year it is cheaper to perform multi-omic studies, often exponentially so. In addition to its increasing accessibility, multi-omics reveals a view of systems biology to an unprecedented depth. Thus, multi-omics can be used to answer a broad range of biological questions in finer resolution than previous methods. We used six omic measurements-four nucleic acid (i.e., genomic, epigenomic, transcriptomics, and metagenomic) and two mass spectrometry (proteomics and metabolomics) based-to highlight an analysis workflow on this type of data, which is often vast. This workflow is not exhaustive of all the omic measurements or analysis methods, but it will provide an experienced or even a novice multi-omic researcher with the tools necessary to analyze their data. This review begins with analyzing a single ome and study design, and then synthesizes best practices in data integration techniques that include machine learning. Furthermore, we delineate methods to validate findings from multi-omic integration. Ultimately, multi-omic integration offers a window into the complexity of molecular interactions and a comprehensive view of systems biology.

    View details for DOI 10.3390/biom10121606

    View details for PubMedID 33260881

  • Metabolic Dynamics and Prediction of Gestational Age and Time to Delivery in Pregnant Women OBSTETRICAL & GYNECOLOGICAL SURVEY Liang, L., Rasmussen, M., Piening, B., Shen, X., Chen, S., Rost, H., Snyder, J. K., Tibshirani, R., Skotte, L., Lee, N. Y., Contrepois, K., Feenstra, B., Zackriah, H., Snyder, M., Melbye, M. 2020; 75 (11): 649–51
  • Inherited causes of clonal haematopoiesis in 97,691 whole genomes. Nature Bick, A. G., Weinstock, J. S., Nandakumar, S. K., Fulco, C. P., Bao, E. L., Zekavat, S. M., Szeto, M. D., Liao, X., Leventhal, M. J., Nasser, J., Chang, K., Laurie, C., Burugula, B. B., Gibson, C. J., Lin, A. E., Taub, M. A., Aguet, F., Ardlie, K., Mitchell, B. D., Barnes, K. C., Moscati, A., Fornage, M., Redline, S., Psaty, B. M., Silverman, E. K., Weiss, S. T., Palmer, N. D., Vasan, R. S., Burchard, E. G., Kardia, S. L., He, J., Kaplan, R. C., Smith, N. L., Arnett, D. K., Schwartz, D. A., Correa, A., de Andrade, M., Guo, X., Konkle, B. A., Custer, B., Peralta, J. M., Gui, H., Meyers, D. A., McGarvey, S. T., Chen, I. Y., Shoemaker, M. B., Peyser, P. A., Broome, J. G., Gogarten, S. M., Wang, F. F., Wong, Q., Montasser, M. E., Daya, M., Kenny, E. E., North, K. E., Launer, L. J., Cade, B. E., Bis, J. C., Cho, M. H., Lasky-Su, J., Bowden, D. W., Cupples, L. A., Mak, A. C., Becker, L. C., Smith, J. A., Kelly, T. N., Aslibekyan, S., Heckbert, S. R., Tiwari, H. K., Yang, I. V., Heit, J. A., Lubitz, S. A., Johnsen, J. M., Curran, J. E., Wenzel, S. E., Weeks, D. E., Rao, D. C., Darbar, D., Moon, J., Tracy, R. P., Buth, E. J., Rafaels, N., Loos, R. J., Durda, P., Liu, Y., Hou, L., Lee, J., Kachroo, P., Freedman, B. I., Levy, D., Bielak, L. F., Hixson, J. E., Floyd, J. S., Whitsel, E. A., Ellinor, P. T., Irvin, M. R., Fingerlin, T. E., Raffield, L. M., Armasu, S. M., Wheeler, M. M., Sabino, E. C., Blangero, J., Williams, L. K., Levy, B. D., Sheu, W. H., Roden, D. M., Boerwinkle, E., Manson, J. E., Mathias, R. A., Desai, P., Taylor, K. D., Johnson, A. D., NHLBI Trans-Omics for Precision Medicine Consortium, Auer, P. L., Kooperberg, C., Laurie, C. C., Blackwell, T. W., Smith, A. V., Zhao, H., Lange, E., Lange, L., Rich, S. S., Rotter, J. I., Wilson, J. G., Scheet, P., Kitzman, J. O., Lander, E. S., Engreitz, J. M., Ebert, B. L., Reiner, A. P., Jaiswal, S., Abecasis, G., Sankaran, V. G., Kathiresan, S., Natarajan, P., Abe, N., Albert, C., Almasy, L., Alonso, A., Ament, S., Anderson, P., Anugu, P., Applebaum-Bowden, D., Arking, D., Ashley-Koch, A., Aslibekyan, S., Assimes, T., Avramopoulos, D., Barnard, J., Barr, R. G., Barron-Casella, E., Barwick, L., Beaty, T., Beck, G., Becker, D., Beer, R., Beitelshees, A., Benjamin, E., Benos, P., Bezerra, M., Bielak, L., Bowler, R., Brody, J., Broeckel, U., Bunting, K., Bustamante, C., Cardwell, J., Carey, V., Carty, C., Casaburi, R., Casella, J., Castaldi, P., Chaffin, M., Chang, C., Chang, Y., Chasman, D., Chavan, S., Chen, B., Chen, W., Choi, S. H., Chuang, L., Chung, M., Chung, R., Clish, C., Comhair, S., Cornell, E., Crandall, C., Crapo, J., Curtis, J., Damcott, C., Das, S., David, S., Davis, C., DeBaun, M., Deka, R., DeMeo, D., Devine, S., Duan, Q., Duggirala, R., Dutcher, S., Eaton, C., Ekunwe, L., Boueiz, A. E., Emery, L., Erzurum, S., Farber, C., Flickinger, M., Franceschini, N., Frazar, C., Fu, M., Fullerton, S. M., Fulton, L., Gabriel, S., Gan, W., Gao, S., Gao, Y., Gass, M., Gelb, B., Priscilla Geng, X., Geraci, M., Germer, S., Gerszten, R., Ghosh, A., Gibbs, R., Gignoux, C., Gladwin, M., Glahn, D., Gong, D., Goring, H., Graw, S., Grine, D., Gu, C. C., Guan, Y., Gupta, N., Haessler, J., Hall, M., Harris, D., Hawley, N. L., Heavner, B., Hernandez, R., Herrington, D., Hersh, C., Hidalgo, B., Hobbs, B., Hokanson, J., Hong, E., Hoth, K., Agnes Hsiung, C., Hung, Y., Huston, H., Hwu, C. M., Jackson, R., Jain, D., Jaquish, C., Jhun, M. A., Johnson, C., Johnston, R., Jones, K., Kang, H. M., Kelly, S., Kessler, M., Khan, A., Kim, W., Kinney, G., Kramer, H., Lange, C., LeBoff, M., Lee, S. S., Lee, W., LeFaive, J., Levine, D., Lewis, J., Li, X., Li, Y., Lin, H., Lin, H., Lin, K. H., Lin, X., Liu, S., Liu, Y., Lunetta, K., Luo, J., Mahaney, M., Make, B., Manichaikul, A., Margolin, L., Martin, L., Mathai, S., May, S., McArdle, P., McDonald, M., McFarland, S., McGoldrick, D., McHugh, C., Mei, H., Mestroni, L., Mikulla, J., Min, N., Minear, M., Minster, R. L., Moll, M., Montgomery, C., Musani, S., Mwasongwe, S., Mychaleckyj, J. C., Nadkarni, G., Naik, R., Naseri, T., Nekhai, S., Nelson, S. C., Neltner, B., Nickerson, D., O'Connell, J., O'Connor, T., Ochs-Balcom, H., Paik, D., Pankow, J., Papanicolaou, G., Parsa, A., Perez, M., Perry, J., Peters, U., Peyser, P., Phillips, L. S., Pollin, T., Post, W., Becker, J. P., Boorgula, M. P., Preuss, M., Qasba, P., Qiao, D., Qin, Z., Rasmussen-Torvik, L., Ratan, A., Reed, R., Regan, E., Sefuiva Reupena, M., Rice, K., Roselli, C., Ruczinski, I., Russell, P., Ruuska, S., Ryan, K., Saleheen, D., Salimi, S., Salzberg, S., Sandow, K., Scheller, C., Schmidt, E., Schwander, K., Sciurba, F., Seidman, C., Seidman, J., Sheehan, V., Sherman, S. L., Shetty, A., Shetty, A., Silver, B., Smith, J., Smith, T., Smoller, S., Snively, B., Snyder, M., Sofer, T., Sotoodehnia, N., Stilp, A. M., Storm, G., Streeten, E., Su, J. L., Sung, Y. J., Sylvia, J., Szpiro, A., Sztalryd, C., Taliun, D., Tang, H., Taylor, M., Taylor, S., Telen, M., Thornton, T. A., Threlkeld, M., Tinker, L., Tirschwell, D., Tishkoff, S., Tiwari, H., Tong, C., Tsai, M., Vaidya, D., Berg, D. V., VandeHaar, P., Vrieze, S., Walker, T., Wallace, R., Walts, A., Wang, H., Watson, K., Weir, B., Weng, L., Wessel, J., Willer, C., Williams, K., Wilson, C., Wu, J., Xu, H., Yanek, L., Yang, R., Zaghloul, N., Zhang, Y., Zhao, S. X., Zhao, W., Zhi, D., Zhou, X., Zhu, X., Zody, M., Zoellner, S. 2020

    Abstract

    Age is the dominant risk factor for most chronic human diseases, but the mechanisms through which ageing confers this risk are largely unknown1. The age-related acquisition of somatic mutations that lead to clonal expansion in regenerating haematopoietic stem cell populations has recently been associated with both haematological cancer2-4 and coronary heart disease5-this phenomenon istermed clonal haematopoiesis of indeterminate potential (CHIP)6. Simultaneous analyses of germline and somatic whole-genome sequences provide the opportunity to identify root causes of CHIP. Here we analyse high-coverage whole-genome sequences from 97,691 participants of diverse ancestries in the National Heart, Lung, and Blood Institute Trans-omics for Precision Medicine (TOPMed) programme, and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid and inflammatory traits that are specific to different CHIPdriver genes. Association of a genome-wide set of germline genetic variants enabled the identification of three genetic loci associated with CHIP status, including one locus at TET2 that was specific to individuals of African ancestry. In silico-informed in vitro evaluation of the TET2 germline locus enabled the identification of a causal variant that disrupts a TET2 distal enhancer, resulting in increased self-renewal of haematopoietic stem cells. Overall, we observe that germline genetic variation shapes haematopoietic stem cell function, leading to CHIP through mechanisms that are specific to clonal haematopoiesis as well as shared mechanisms that lead to somatic mutations across tissues.

    View details for DOI 10.1038/s41586-020-2819-2

    View details for PubMedID 33057201

  • Quality-control mechanisms targeting translationally stalled and C-terminally extended poly(GR) associated with ALS/FTD. Proceedings of the National Academy of Sciences of the United States of America Li, S., Wu, Z., Tantray, I., Li, Y., Chen, S., Dong, J., Glynn, S., Vogel, H., Snyder, M., Lu, B. 2020

    Abstract

    Maintaining the fidelity of nascent peptide chain (NP) synthesis is essential for proteome integrity and cellular health. Ribosome-associated quality control (RQC) serves to resolve stalled translation, during which untemplated Ala/Thr residues are added C terminally to stalled peptide, as shown during C-terminal Ala and Thr addition (CAT-tailing) in yeast. The mechanism and biological effects of CAT-tailing-like activity in metazoans remain unclear. Here we show that CAT-tailing-like modification of poly(GR), a dipeptide repeat derived from amyotrophic lateral sclerosis with frontotemporal dementia (ALS/FTD)-associated GGGGCC (G4C2) repeat expansion in C9ORF72, contributes to disease. We find that poly(GR) can act as a mitochondria-targeting signal, causing some poly(GR) to be cotranslationally imported into mitochondria. However, poly(GR) translation on mitochondrial surface is frequently stalled, triggering RQC and CAT-tailing-like C-terminal extension (CTE). CTE promotes poly(GR) stabilization, aggregation, and toxicity. Our genetic studies in Drosophila uncovered an important role of the mitochondrial protease YME1L in clearing poly(GR), revealing mitochondria as major sites of poly(GR) metabolism. Moreover, the mitochondria-associated noncanonical Notch signaling pathway impinges on the RQC machinery to restrain poly(GR) accumulation, at least in part through the AKT/VCP axis. The conserved actions of YME1L and noncanonical Notch signaling in animal models and patient cells support their fundamental involvement in ALS/FTD.

    View details for DOI 10.1073/pnas.2005506117

    View details for PubMedID 32958650

  • The GTEx Consortium atlas of genetic regulatory effects across human tissues SCIENCE Aguet, F., Barbeira, A. N., Bonazzola, R., Brown, A., Castel, S. E., Jo, B., Kasela, S., Kim-Hellmuth, S., Liang, Y., Parsana, P., Flynn, E., Fresard, L., Gamazon, E. R., Hamel, A. R., He, Y., Hormozdiari, F., Mohammadi, P., Munoz-Aguirre, M., Ardlie, K. G., Battle, A., Bonazzola, R., Brown, C. D., Cox, N., Dermitzakis, E. T., Engelhardt, B. E., Garrido-Martin, D., Gay, N. R., Getz, G., Guigo, R., Hamel, A. R., Handsaker, R. E., He, Y., Hoffman, P. J., Hormozdiari, F., Im, H., Jo, B., Kasela, S., Kashin, S., Kim-Hellmuth, S., Kwong, A., Lappalainen, T., Li, X., Liang, Y., MacArthur, D. G., Mohammadi, P., Montgomery, S. B., Munoz-Aguirre, M., Rouhana, J. M., Hormozdiari, F., Im, H., Kim-Hellmuth, S., Ardlie, K. G., Getz, G., Guigo, R., Im, H., Lappalainen, T., Montgomery, S. B., Im, H., Lappalainen, T., Lappalainen, T., Anand, S., Gabriel, S., Getz, G., Graubert, A., Hadley, K., Handsaker, R. E., Huang, K. H., Kashin, S., Li, X., MacArthur, D. G., Meier, S. R., Nedzel, J. L., Balliu, B., Conrad, D., Cotter, D. J., Das, S., de Goede, O. M., Eskin, E., Eulalio, T. Y., Ferraro, N. M., Garrido-Martin, D., Gay, N. R., Getz, G., Graubert, A., Guigo, R., Hadley, K., Hamel, A. R., Handsaker, R. E., He, Y., Hoffman, P. J., Hormozdiari, F., Hou, L., Huang, K. H., Im, H., Jo, B., Kasela, S., Kashin, S., Kellis, M., Kim-Hellmuth, S., Kwong, A., Lappalainen, T., Li, X., Li, X., Liang, Y., MacArthur, D. G., Mangul, S., Meier, S. R., Mohammadi, P., Montgomery, S. B., Munoz-Aguirre, M., Nachun, D. C., Nedzel, J. L., Nguyen, D. Y., Nobel, A. B., Park, Y., Reverter, F., Sabatti, C., Saha, A., Segre, A., Stephens, M., Strober, B. J., Teran, N. A., Todres, E., Vinuela, A., Wang, G., Wen, X., Wright, F., Wucher, V., Zou, Y., Ferreira, P. G., Li, G., Mele, M., Yeger-Lotem, E., Barcus, M. E., Bradbury, D., Krubit, T., McLean, J. A., Qi, L., Robinson, K., Roche, N., Smith, A. M., Tabor, D. E., Undale, A., Bridge, J., Brigham, L. E., Foster, B. A., Gillard, B. M., Hasz, R., Hunter, M., Johns, C., Johnson, M., Karasik, E., Kopen, G., Leinweber, W. F., McDonald, A., Moser, M. T., Myer, K., Ramsey, K. D., Roe, B., Shad, S., Thomas, J. A., Walters, G., Washington, M., Wheeler, J., Jewell, S. D., Rohrer, D. C., Valley, D. R., Davis, D. A., Mash, D. C., Branton, P. A., Sobin, L., Barker, L. K., Gardiner, H. M., Mosavel, M., Siminoff, L. A., Flicek, P., Haeussler, M., Juettemann, T., Kent, W., Lee, C. M., Powell, C. C., Rosenbloom, K. R., Ruffier, M., Sheppard, D., Taylor, K., Trevanion, S. J., Zerbino, D. R., Abell, N. S., Akey, J., Chen, L., Demanelis, K., Doherty, J. A., Feinberg, A. P., Hansen, K. D., Hickey, P. F., Hou, L., Jasmine, F., Jiang, L., Kaul, R., Kellis, M., Kibriya, M. G., Li, J., Li, Q., Lin, S., Linder, S. E., Montgomery, S. B., Oliva, M., Park, Y., Pierce, B. L., Rizzardi, L. F., Skol, A. D., Smith, K. S., Snyder, M., Stamatoyannopoulos, J., Tang, H., Wang, M., Carithers, L. J., Guan, P., Koester, S. E., Little, A., Moore, H. M., Nierras, C. R., Rao, A. K., Vaught, J. B., Volpi, S., GTEx Consortium 2020; 369 (6509): 1318-+
  • Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature genetics Li, X., Li, Z., Zhou, H., Gaynor, S. M., Liu, Y., Chen, H., Sun, R., Dey, R., Arnett, D. K., Aslibekyan, S., Ballantyne, C. M., Bielak, L. F., Blangero, J., Boerwinkle, E., Bowden, D. W., Broome, J. G., Conomos, M. P., Correa, A., Cupples, L. A., Curran, J. E., Freedman, B. I., Guo, X., Hindy, G., Irvin, M. R., Kardia, S. L., Kathiresan, S., Khan, A. T., Kooperberg, C. L., Laurie, C. C., Liu, X. S., Mahaney, M. C., Manichaikul, A. W., Martin, L. W., Mathias, R. A., McGarvey, S. T., Mitchell, B. D., Montasser, M. E., Moore, J. E., Morrison, A. C., O'Connell, J. R., Palmer, N. D., Pampana, A., Peralta, J. M., Peyser, P. A., Psaty, B. M., Redline, S., Rice, K. M., Rich, S. S., Smith, J. A., Tiwari, H. K., Tsai, M. Y., Vasan, R. S., Wang, F. F., Weeks, D. E., Weng, Z., Wilson, J. G., Yanek, L. R., NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Lipids Working Group, Neale, B. M., Sunyaev, S. R., Abecasis, G. R., Rotter, J. I., Willer, C. J., Peloso, G. M., Natarajan, P., Lin, X., Abe, N., Abecasis, G. R., Aguet, F., Albert, C., Almasy, L., Alonso, A., Ament, S., Anderson, P., Anugu, P., Applebaum-Bowden, D., Ardlie, K., Arking, D., Arnett, D. K., Ashley-Koch, A., Aslibekyan, S., Assimes, T., Auer, P., Avramopoulos, D., Barnard, J., Barnes, K., Barr, R. G., Barron-Casella, E., Barwick, L., Beaty, T., Beck, G., Becker, D., Becker, L., Beer, R., Beitelshees, A., Benjamin, E., Benos, T., Bezerra, M., Bielak, L. F., Bis, J., Blackwell, T., Blangero, J., Boerwinkle, E., Bowden, D. W., Bowler, R., Brody, J., Broeckel, U., Broome, J. G., Bunting, K., Burchard, E., Bustamante, C., Buth, E., Cade, B., Cardwell, J., Carey, V., Carty, C., Casaburi, R., Casella, J., Castaldi, P., Chaffin, M., Chang, C., Chang, Y., Chasman, D., Chavan, S., Chen, B., Chen, W., Chen, Y. I., Cho, M., Choi, S. H., Chuang, L., Chung, M., Chung, R., Clish, C., Comhair, S., Conomos, M. P., Cornell, E., Correa, A., Crandall, C., Crapo, J., Cupples, L. A., Curran, J. E., Curtis, J., Custer, B., Damcott, C., Darbar, D., Das, S., David, S., Davis, C., Daya, M., de Andrade, M., Fuentes, L. d., DeBaun, M., Deka, R., DeMeo, D., Devine, S., Duan, Q., Duggirala, R., Durda, J. P., Dutcher, S., Eaton, C., Ekunwe, L., El Boueiz, A., Ellinor, P., Emery, L., Erzurum, S., Farber, C., Fingerlin, T., Flickinger, M., Fornage, M., Franceschini, N., Frazar, C., Fu, M., Fullerton, S. M., Fulton, L., Gabriel, S., Gan, W., Gao, S., Gao, Y., Gass, M., Gelb, B., Geng, X. P., Geraci, M., Germer, S., Gerszten, R., Ghosh, A., Gibbs, R., Gignoux, C., Gladwin, M., Glahn, D., Gogarten, S., Gong, D., Goring, H., Graw, S., Grine, D., Gu, C. C., Guan, Y., Guo, X., Gupta, N., Haessler, J., Hall, M., Harris, D., Hawley, N. L., He, J., Heckbert, S., Hernandez, R., Herrington, D., Hersh, C., Hidalgo, B., Hixson, J., Hobbs, B., Hokanson, J., Hong, E., Hoth, K., Hsiung, C. A., Hung, Y., Huston, H., Hwu, C. M., Irvin, M. R., Jackson, R., Jain, D., Jaquish, C., Jhun, M. A., Johnsen, J., Johnson, A., Johnson, C., Johnston, R., Jones, K., Kang, H. M., Kaplan, R., Kardia, S. L., Kathiresan, S., Kelly, S., Kenny, E., Kessler, M., Khan, A. T., Kim, W., Kinney, G., Konkle, B., Kooperberg, C. L., Kramer, H., Lange, C., Lange, E., Lange, L., Laurie, C. C., Laurie, C., LeBoff, M., Lee, J., Lee, S. S., Lee, W., LeFaive, J., Levine, D., Levy, D., Lewis, J., Li, X., Li, Y., Lin, H., Lin, H., Lin, K. H., Lin, X., Liu, S., Liu, Y., Liu, Y., Loos, R. J., Lubitz, S., Lunetta, K., Luo, J., Mahaney, M. C., Make, B., Manichaikul, A. W., Manson, J., Margolin, L., Martin, L. W., Mathai, S., Mathias, R. A., May, S., McArdle, P., McDonald, M., McFarland, S., McGarvey, S. T., McGoldrick, D., McHugh, C., Mei, H., Mestroni, L., Meyers, D. A., Mikulla, J., Min, N., Minear, M., Minster, R. L., Mitchell, B. D., Moll, M., Montasser, M. E., Montgomery, C., Moscati, A., Musani, S., Mwasongwe, S., Mychaleckyj, J. C., Nadkarni, G., Naik, R., Naseri, T., Natarajan, P., Nekhai, S., Nelson, S. C., Neltner, B., Nickerson, D., North, K., O'Connell, J. R., O'Connor, T., Ochs-Balcom, H., Paik, D., Palmer, N. D., Pankow, J., Papanicolaou, G., Parsa, A., Peralta, J. M., Perez, M., Perry, J., Peters, U., Peyser, P. A., Phillips, L. S., Pollin, T., Post, W., Becker, J. P., Boorgula, M. P., Preuss, M., Psaty, B. M., Qasba, P., Qiao, D., Qin, Z., Rafaels, N., Raffield, L., Vasan, R. S., Rao, D. C., Rasmussen-Torvik, L., Ratan, A., Redline, S., Reed, R., Regan, E., Reiner, A., Reupena, M. S., Rice, K. M., Rich, S. S., Roden, D., Roselli, C., Rotter, J. I., Ruczinski, I., Russell, P., Ruuska, S., Ryan, K., Sabino, E. C., Saleheen, D., Salimi, S., Salzberg, S., Sandow, K., Sankaran, V. G., Scheller, C., Schmidt, E., Schwander, K., Schwartz, D., Sciurba, F., Seidman, C., Seidman, J., Sheehan, V., Sherman, S. L., Shetty, A., Shetty, A., Sheu, W. H., Shoemaker, M. B., Silver, B., Silverman, E., Smith, J. A., Smith, J., Smith, N., Smith, T., Smoller, S., Snively, B., Snyder, M., Sofer, T., Sotoodehnia, N., Stilp, A. M., Storm, G., Streeten, E., Su, J. L., Sung, Y. J., Sylvia, J., Szpiro, A., Sztalryd, C., Taliun, D., Tang, H., Taub, M., Taylor, K. D., Taylor, M., Taylor, S., Telen, M., Thornton, T. A., Threlkeld, M., Tinker, L., Tirschwell, D., Tishkoff, S., Tiwari, H. K., Tong, C., Tracy, R., Tsai, M. Y., Vaidya, D., Van Den Berg, D., VandeHaar, P., Vrieze, S., Walker, T., Wallace, R., Walts, A., Wang, F. F., Wang, H., Watson, K., Weeks, D. E., Weir, B., Weiss, S., Weng, L., Wessel, J., Willer, C. J., Williams, K., Williams, L. K., Wilson, C., Wilson, J. G., Wong, Q., Wu, J., Xu, H., Yanek, L. R., Yang, I., Yang, R., Zaghloul, N., Zekavat, M., Zhang, Y., Zhao, S. X., Zhao, W., Zhi, D., Zhou, X., Zhu, X., Zody, M., Zoellner, S., Abdalla, M., Abecasis, G. R., Arnett, D. K., Aslibekyan, S., Assimes, T., Atkinson, E., Ballantyne, C. M., Beitelshees, A., Bielak, L. F., Bis, J., Bodea, C., Boerwinkle, E., Bowden, D. W., Brody, J., Cade, B., Carlson, J., Chang, I., Chen, Y. I., Chun, S., Chung, R., Conomos, M. P., Correa, A., Cupples, L. A., Damcott, C., de Vries, P., Do, R., Elliott, A., Fu, M., Ganna, A., Gong, D., Graham, S., Haas, M., Haring, B., He, J., Heckbert, S., Himes, B., Hixson, J., Irvin, M. R., Jain, D., Jarvik, G., Jhun, M. A., Jiang, J., Jun, G., Kalyani, R., Kardia, S. L., Kathiresan, S., Khera, A., Klarin, D., Kooperberg, C. L., Kral, B., Lange, L., Laurie, C. C., Laurie, C., Lemaitre, R., Li, Z., Li, X., Lin, X., Mahaney, M. C., Manichaikul, A. W., Martin, L. W., Mathias, R. A., Mathur, R., McGarvey, S. T., McHugh, C., McLenithan, J., Mikulla, J., Mitchell, B. D., Montasser, M. E., Moran, A., Morrison, A. C., Nakao, T., Natarajan, P., Nickerson, D., North, K., O'Connell, J. R., O'Donnell, C., Palmer, N. D., Pampana, A., Patel, A., Peloso, G. M., Perry, J., Peters, U., Peyser, P. A., Pirruccello, J., Pollin, T., Preuss, M., Psaty, B. M., Rao, D. C., Redline, S., Reed, R., Reiner, A., Rich, S. S., Rosenthal, S., Rotter, J. I., Schoenberg, J., Selvaraj, M. S., Sheu, W. H., Smith, J. A., Sofer, T., Stilp, A. M., Sunyaev, S. R., Surakka, I., Sztalryd, C., Tang, H., Taylor, K. D., Tsai, M. Y., Uddin, M. M., Urbut, S., Verbanck, M., Von Holle, A., Wang, H., Wang, F. F., Wiggins, K., Willer, C. J., Wilson, J. G., Wolford, B., Xu, H., Yanek, L. R., Zaghloul, N., Zekavat, M., Zhang, J. 2020

    Abstract

    Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.

    View details for DOI 10.1038/s41588-020-0676-4

    View details for PubMedID 32839606

  • Multi-faceted epigenetic dysregulation of gene expression promotes esophageal squamous cell carcinoma. Nature communications Cao, W., Lee, H., Wu, W., Zaman, A., McCorkle, S., Yan, M., Chen, J., Xing, Q., Sinnott-Armstrong, N., Xu, H., Sailani, M. R., Tang, W., Cui, Y., Liu, J., Guan, H., Lv, P., Sun, X., Sun, L., Han, P., Lou, Y., Chang, J., Wang, J., Gao, Y., Guo, J., Schenk, G., Shain, A. H., Biddle, F. G., Collisson, E., Snyder, M., Bivona, T. G. 2020; 11 (1): 3675

    Abstract

    Epigenetic landscapes can shape physiologic and disease phenotypes. We used integrative, high resolution multi-omics methods to delineate the methylome landscape and characterize the oncogenic drivers of esophageal squamous cell carcinoma (ESCC). We found 98% of CpGs are hypomethylated across the ESCC genome. Hypo-methylated regions are enriched in areas with heterochromatin binding markers (H3K9me3, H3K27me3), while hyper-methylated regions are enriched in polycomb repressive complex (EZH2/SUZ12) recognizing regions. Altered methylation in promoters, enhancers, and gene bodies, as well as in polycomb repressive complex occupancy and CTCF binding sites are associated with cancer-specific gene dysregulation. Epigenetic-mediated activation of non-canonical WNT/beta-catenin/MMP signaling and a YY1/lncRNA ESCCAL-1/ribosomal protein network are uncovered and validated as potential novel ESCC driver alterations. This study advances our understanding of how epigenetic landscapes shape cancer pathogenesis and provides a resource for biomarker and target discovery.

    View details for DOI 10.1038/s41467-020-17227-z

    View details for PubMedID 32699215

  • Prevention of Severe Intestinal Barrier Dysfunction Through a Single-Species Probiotics Is Associated With the Activation of Microbiome-Mediated Glutamate-Glutamine Biosynthesis. Shock (Augusta, Ga.) Leng, Y., Jiang, C., Xing, X., Tsai, M., Snyder, M., Zhai, A., Yao, G. 2020

    Abstract

    INTRODUCTION: Intra-abdominal hypertension (IAH), the leading complication in the intensive care unit, significantly disturbs the gut microbial composition by decreasing the relative abundance of Lactobacillus and increasing the relative abundance of opportunistic infectious bacteria.METHODS: To evaluate the preventative effect of Lactobacillus-based probiotics on IAH-induced intestinal barrier damages, a single-species probiotics (L92) and a multi-species probiotics (VSL#3) were introduced orally to Sprague-Dawley rats for 7 days before inducing IAH. The intestinal histology and permeability to macromolecules (fluoresceine isothiocyanate, FITC-dextran, N = 8 for each group), the parameters of immunomodulatory and oxidative responses [Monocyte chemotactic protein 1(MCP-1), interleukin-1beta (IL-1beta), interleukin-4 (IL-4), interleukin-10 (IL-10), malonaldehyde (MDA), glutathione peroxidase (GSH- Px), catalase (CAT), and superoxide dismutase (SOD); N = 4 for each group], and the microbiome profiling (N = 4 for each group) were analyzed.RESULTS: 7-day pretreatments of L92 significantly alleviated the IAH-induced increase in intestinal permeability to FITC-dextran and histological damage(P < 0.0001), accompanied with the suppression of inflammatory and oxidative activation. The increase of MCP-1 and IL-1beta were significantly inhibited (P < 0.05); the anti-inflammatory cytokines, IL-4 and IL-10 were maintained at high levels; and the suppression of CAT (P < 0.05) were significantly reversed when pretreated with L92. On the contrary, no significant protective effects were observed in the VSL#3-pretreated group. Among the 84 identified species, 260 MetaCyc pathways, and 217 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, the protective effects of L92 were correlated with an increased relative abundance of Bacteroides finegoldii, Odoribacter splanchnicus, and the global activation of amino acid biosynthesis pathways, especially the glutamate-glutamine biosynthesis pathway.CONCLUSIONS: 7-day pretreatment with a single-species probiotics can prevent IAH induced severe intestinal barrier dysfunction, potentially through microbial modulation.

    View details for DOI 10.1097/SHK.0000000000001593

    View details for PubMedID 32694391

  • Physiological blood-brain transport is impaired with age by a shift in transcytosis. Nature Yang, A. C., Stevens, M. Y., Chen, M. B., Lee, D. P., Stahli, D., Gate, D., Contrepois, K., Chen, W., Iram, T., Zhang, L., Vest, R. T., Chaney, A., Lehallier, B., Olsson, N., du Bois, H., Hsieh, R., Cropper, H. C., Berdnik, D., Li, L., Wang, E. Y., Traber, G. M., Bertozzi, C. R., Luo, J., Snyder, M. P., Elias, J. E., Quake, S. R., James, M. L., Wyss-Coray, T. 2020

    Abstract

    The vascular interfaceof the brain, known as the blood-brain barrier (BBB), is understood to maintain brain function in part via its low transcellular permeability1-3. Yet, recent studies have demonstrated that brain ageing is sensitive to circulatory proteins4,5. Thus, it is unclear whether permeability to individually injected exogenous tracers-as isstandard in BBB studies-fully represents blood-to-brain transport. Here we label hundreds of proteins constituting the mouse blood plasma proteome, and upon their systemic administration, study the BBB with its physiological ligand. We find that plasma proteins readily permeate the healthy brain parenchyma, with transport maintained by BBB-specific transcriptional programmes. Unlike IgG antibody, plasma protein uptake diminishes in the aged brain, driven by an age-related shift in transport from ligand-specific receptor-mediated to non-specific caveolar transcytosis. This age-related shift occurs alongside a specific loss of pericyte coverage. Pharmacological inhibition of the age-upregulated phosphatase ALPL, a predicted negative regulator of transport, enhances brain uptake of therapeutically relevant transferrin, transferrin receptor antibody and plasma. These findings reveal the extent of physiological protein transcytosis to the healthy brain, a mechanism of widespread BBB dysfunction with age and a strategy for enhanced drug delivery.

    View details for DOI 10.1038/s41586-020-2453-z

    View details for PubMedID 32612231

  • Molecular Transducers of Physical Activity Consortium (MoTrPAC): Mapping the Dynamic Responses to Exercise. Cell Sanford, J. A., Nogiec, C. D., Lindholm, M. E., Adkins, J. N., Amar, D., Dasari, S., Drugan, J. K., Fernandez, F. M., Radom-Aizik, S., Schenk, S., Snyder, M. P., Tracy, R. P., Vanderboom, P., Trappe, S., Walsh, M. J., Molecular Transducers of Physical Activity Consortium, Adkins, J. N., Amar, D., Dasari, S., Drugan, J. K., Evans, C. R., Fernandez, F. M., Li, Y., Lindholm, M. E., Nogiec, C. D., Radom-Aizik, S., Sanford, J. A., Schenk, S., Snyder, M. P., Tomlinson, L., Tracy, R. P., Trappe, S., Vanderboom, P., Walsh, M. J., Alekel, D. L., Bekirov, I., Boyce, A. T., Boyington, J., Fleg, J. L., Joseph, L. J., Laughlin, M. R., Maruvada, P., Morris, S. A., McGowan, J. A., Nierras, C., Pai, V., Peterson, C., Ramos, E., Roary, M. C., Williams, J. P., Xia, A., Cornell, E., Rooney, J., Miller, M. E., Ambrosius, W. T., Rushing, S., Stowe, C. L., Rejeski, W. J., Nicklas, B. J., Pahor, M., Lu, C., Trappe, T., Chambers, T., Raue, U., Lester, B., Bergman, B. C., Bessesen, D. H., Jankowski, C. M., Kohrt, W. M., Melanson, E. L., Moreau, K. L., Schauer, I. E., Schwartz, R. S., Kraus, W. E., Slentz, C. A., Huffman, K. M., Johnson, J. L., Willis, L. H., Kelly, L., Houmard, J. A., Dubis, G., Broskey, N., Goodpaster, B. H., Sparks, L. M., Coen, P. M., Cooper, D. M., Haddad, F., Rankinen, T., Ravussin, E., Johannsen, N., Harris, M., Jakicic, J. M., Newman, A. B., Forman, D. D., Kershaw, E., Rogers, R. J., Nindl, B. C., Page, L. C., Stefanovic-Racic, M., Barr, S. L., Rasmussen, B. B., Moro, T., Paddon-Jones, D., Volpi, E., Spratt, H., Musi, N., Espinoza, S., Patel, D., Serra, M., Gelfond, J., Burns, A., Bamman, M. M., Buford, T. W., Cutter, G. R., Bodine, S. C., Esser, K., Farrar, R. P., Goodyear, L. J., Hirshman, M. F., Albertson, B. G., Qian, W., Piehowski, P., Gritsenko, M. A., Monore, M. E., Petyuk, V. A., McDermott, J. E., Hansen, J. N., Hutchison, C., Moore, S., Gaul, D. A., Clish, C. B., Avila-Pacheco, J., Dennis, C., Kellis, M., Carr, S., Jean-Beltran, P. M., Keshishian, H., Mani, D. R., Clauser, K., Krug, K., Mundorff, C., Pearce, C., Ivanova, A. A., Ortlund, E. A., Maner-Smith, K., Uppal, K., Zhang, T., Sealfon, S. C., Zavlasky, E., Nair, V., Li, S., Jain, N., Ge, Y., Sun, Y., Nudelman, G., Ruf-Zamojski, F., Smith, G., Pincas, N., Rubenstein, A., Amper, M. A., Seenarine, N., Lappalainen, T., Lanza, I. R., Nair, K. S., Klaus, K., Montgomery, S. B., Smith, K. S., Gay, N. R., Zhao, B., Hung, C. J., Zebarjadi, N., Balliu, B., Fresard, L., Burant, C. F., Li, J. Z., Kachman, M., Soni, T., Raskind, A. B., Gerszten, R., Robbins, J., Ilkayeva, O., Muehlbauer, M. J., Newgard, C. B., Ashley, E. A., Wheeler, M. T., Jimenez-Morales, D., Raja, A., Dalton, K. P., Zhen, J., Kim, Y. S., Christle, J. W., Marwaha, S., Chin, E. T., Hershman, S. G., Hastie, T., Tibshirani, R., Rivas, M. A. 2020; 181 (7): 1464–74

    Abstract

    Exercise provides a robust physiological stimulus that evokes cross-talk among multiple tissues that when repeated regularly (i.e., training) improves physiological capacity, benefits numerous organ systems, and decreases the risk for premature mortality. However, a gap remains in identifying the detailed molecular signals induced by exercise that benefits health and prevents disease. The Molecular Transducers of Physical Activity Consortium (MoTrPAC) was established to address this gap and generate a molecular map of exercise. Preclinical and clinical studies will examine the systemic effects of endurance and resistance exercise across a range of ages and fitness levels by molecular probing of multiple tissues before and after acute and chronic exercise. From this multi-omic and bioinformatic analysis, a molecular map of exercise will be established. Altogether, MoTrPAC will provide a public database that is expected to enhance our understanding of the health benefits of exercise and to provide insight into how physical activity mitigates disease.

    View details for DOI 10.1016/j.cell.2020.06.004

    View details for PubMedID 32589957

  • Towards personalized medicine in maternal and child health: integrating biologic and social determinants. Pediatric research Stevenson, D. K., Wong, R. J., Aghaeepour, N., Maric, I., Angst, M. S., Contrepois, K., Darmstadt, G. L., Druzin, M. L., Eisenberg, M. L., Gaudilliere, B., Gibbs, R. S., Gotlib, I. H., Gould, J. B., Lee, H. C., Ling, X. B., Mayo, J. A., Moufarrej, M. N., Quaintance, C. C., Quake, S. R., Relman, D. A., Sirota, M., Snyder, M. P., Sylvester, K. G., Hao, S., Wise, P. H., Shaw, G. M., Katz, M. 2020

    View details for DOI 10.1038/s41390-020-0981-8

    View details for PubMedID 32454518

  • The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution. Cell Rozenblatt-Rosen, O., Regev, A., Oberdoerffer, P., Nawy, T., Hupalowska, A., Rood, J. E., Ashenberg, O., Cerami, E., Coffey, R. J., Demir, E., Ding, L., Esplin, E. D., Ford, J. M., Goecks, J., Ghosh, S., Gray, J. W., Guinney, J., Hanlon, S. E., Hughes, S. K., Hwang, E. S., Iacobuzio-Donahue, C. A., Jane-Valbuena, J., Johnson, B. E., Lau, K. S., Lively, T., Mazzilli, S. A., Pe'er, D., Santagata, S., Shalek, A. K., Schapiro, D., Snyder, M. P., Sorger, P. K., Spira, A. E., Srivastava, S., Tan, K., West, R. B., Williams, E. H., Human Tumor Atlas Network, Aberle, D., Achilefu, S. I., Ademuyiwa, F. O., Adey, A. C., Aft, R. L., Agarwal, R., Aguilar, R. A., Alikarami, F., Allaj, V., Amos, C., Anders, R. A., Angelo, M. R., Anton, K., Ashenberg, O., Aster, J. C., Babur, O., Bahmani, A., Balsubramani, A., Barrett, D., Beane, J., Bender, D. E., Bernt, K., Berry, L., Betts, C. B., Bletz, J., Blise, K., Boire, A., Boland, G., Borowsky, A., Bosse, K., Bott, M., Boyden, E., Brooks, J., Bueno, R., Burlingame, E. A., Cai, Q., Campbell, J., Caravan, W., Cerami, E., Chaib, H., Chan, J. M., Chang, Y. H., Chatterjee, D., Chaudhary, O., Chen, A. A., Chen, B., Chen, C., Chen, C., Chen, F., Chen, Y., Chheda, M. G., Chin, K., Chiu, R., Chu, S., Chuaqui, R., Chun, J., Cisneros, L., Coffey, R. J., Colditz, G. A., Cole, K., Collins, N., Contrepois, K., Coussens, L. M., Creason, A. L., Crichton, D., Curtis, C., Davidsen, T., Davies, S. R., de Bruijn, I., Dellostritto, L., De Marzo, A., Demir, E., DeNardo, D. G., Diep, D., Ding, L., Diskin, S., Doan, X., Drewes, J., Dubinett, S., Dyer, M., Egger, J., Eng, J., Engelhardt, B., Erwin, G., Esplin, E. D., Esserman, L., Felmeister, A., Feiler, H. S., Fields, R. C., Fisher, S., Flaherty, K., Flournoy, J., Ford, J. M., Fortunato, A., Frangieh, A., Frye, J. L., Fulton, R. S., Galipeau, D., Gan, S., Gao, J., Gao, L., Gao, P., Gao, V. R., Geiger, T., George, A., Getz, G., Ghosh, S., Giannakis, M., Gibbs, D. L., Gillanders, W. E., Goecks, J., Goedegebuure, S. P., Gould, A., Gowers, K., Gray, J. W., Greenleaf, W., Gresham, J., Guerriero, J. L., Guha, T. K., Guimaraes, A. R., Guinney, J., Gutman, D., Hacohen, N., Hanlon, S., Hansen, C. R., Harismendy, O., Harris, K. A., Hata, A., Hayashi, A., Heiser, C., Helvie, K., Herndon, J. M., Hirst, G., Hodi, F., Hollmann, T., Horning, A., Hsieh, J. J., Hughes, S., Huh, W. J., Hunger, S., Hwang, S. E., Iacobuzio-Donahue, C. A., Ijaz, H., Izar, B., Jacobson, C. A., Janes, S., Jane-Valbuena, J., Jayasinghe, R. G., Jiang, L., Johnson, B. E., Johnson, B., Ju, T., Kadara, H., Kaestner, K., Kagan, J., Kalinke, L., Keith, R., Khan, A., Kibbe, W., Kim, A. H., Kim, E., Kim, J., Kolodzie, A., Kopytra, M., Kotler, E., Krueger, R., Krysan, K., Kundaje, A., Ladabaum, U., Lake, B. B., Lam, H., Laquindanum, R., Lau, K. S., Laughney, A. M., Lee, H., Lenburg, M., Leonard, C., Leshchiner, I., Levy, R., Li, J., Lian, C. G., Lim, K., Lin, J., Lin, Y., Liu, Q., Liu, R., Lively, T., Longabaugh, W. J., Longacre, T., Ma, C. X., Macedonia, M. C., Madison, T., Maher, C. A., Maitra, A., Makinen, N., Makowski, D., Maley, C., Maliga, Z., Mallo, D., Maris, J., Markham, N., Marks, J., Martinez, D., Mashl, R. J., Masilionais, I., Mason, J., Massague, J., Massion, P., Mattar, M., Mazurchuk, R., Mazutis, L., Mazzilli, S. A., McKinley, E. T., McMichael, J. F., Merrick, D., Meyerson, M., Miessner, J. R., Mills, G. B., Mills, M., Mondal, S. B., Mori, M., Mori, Y., Moses, E., Mosse, Y., Muhlich, J. L., Murphy, G. F., Navin, N. E., Nawy, T., Nederlof, M., Ness, R., Nevins, S., Nikolov, M., Nirmal, A. J., Nolan, G., Novikov, E., Oberdoerffer, P., O'Connell, B., Offin, M., Oh, S. T., Olson, A., Ooms, A., Ossandon, M., Owzar, K., Parmar, S., Patel, T., Patti, G. J., Pe'er, D., Pe'er, I., Peng, T., Persson, D., Petty, M., Pfister, H., Polyak, K., Pourfarhangi, K., Puram, S. V., Qiu, Q., Quintanal-Villalonga, A., Raj, A., Ramirez-Solano, M., Rashid, R., Reeb, A. N., Regev, A., Reid, M., Resnick, A., Reynolds, S. M., Riesterer, J. L., Rodig, S., Roland, J. T., Rosenfield, S., Rotem, A., Roy, S., Rozenblatt-Rosen, O., Rudin, C. M., Ryser, M. D., Santagata, S., Santi-Vicini, M., Sato, K., Schapiro, D., Schrag, D., Schultz, N., Sears, C. L., Sears, R. C., Sen, S., Sen, T., Shalek, A., Sheng, J., Sheng, Q., Shoghi, K. I., Shrubsole, M. J., Shyr, Y., Sibley, A. B., Siex, K., Simmons, A. J., Singer, D. S., Sivagnanam, S., Slyper, M., Snyder, M. P., Sokolov, A., Song, S., Sorger, P. K., Southard-Smith, A., Spira, A., Srivastava, S., Stein, J., Storm, P., Stover, E., Strand, S. H., Su, T., Sudar, D., Sullivan, R., Surrey, L., Suva, M., Tan, K., Terekhanova, N. V., Ternes, L., Thammavong, L., Thibault, G., Thomas, G. V., Thorsson, V., Todres, E., Tran, L., Tyler, M., Uzun, Y., Vachani, A., Van Allen, E., Vandekar, S., Veis, D. J., Vigneau, S., Vossough, A., Waanders, A., Wagle, N., Wang, L., Wendl, M. C., West, R., Williams, E. H., Wu, C., Wu, H., Wu, H., Wyczalkowski, M. A., Xie, Y., Yang, X., Yapp, C., Yu, W., Yuan, Y., Zhang, D., Zhang, K., Zhang, M., Zhang, N., Zhang, Y., Zhao, Y., Zhou, D. C., Zhou, Z., Zhu, H., Zhu, Q., Zhu, X., Zhu, Y., Zhuang, X. 2020; 181 (2): 236–49

    Abstract

    Crucial transitions in cancer-including tumor initiation, local expansion, metastasis, and therapeutic resistance-involve complex interactions between cells within the dynamic tumor ecosystem. Transformative single-cell genomics technologies and spatial multiplex in situ methods now provide an opportunity to interrogate this complexity at unprecedented resolution. The Human Tumor Atlas Network (HTAN), part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types. This effort complements both ongoing efforts to map healthy organs and previous large-scale cancer genomics approaches focused on bulk sequencing at a single point in time. Generating single-cell, multiparametric, longitudinal atlases and integrating them with clinical outcomes should help identify novel predictive biomarkers and features as well as therapeutically relevant cell types, cell states, and cellular interactions across transitions. The resulting tumor atlases should have a profound impact on our understanding of cancer biology and have the potential to improve cancer detection, prevention, and therapeutic discovery for better precision-medicine treatments of cancer patients and those at risk for cancer.

    View details for DOI 10.1016/j.cell.2020.03.053

    View details for PubMedID 32302568

  • Humans Are Selectively Exposed to Pneumocystis jirovecii. mBio Cisse, O. H., Ma, L., Jiang, C., Snyder, M., Kovacs, J. A. 2020; 11 (2)

    Abstract

    Environmental exposure has a significant impact on human health. While some airborne fungi can cause life-threatening infections, the impact of environment on fungal spore dispersal and transmission is poorly understood. The democratization of shotgun metagenomics allows us to explore important questions about fungal propagation. We focus on Pneumocystis, a genus of host-specific fungi that infect mammals via airborne particles. In humans, Pneumocystis jirovecii causes lethal infections in immunocompromised patients if untreated, although its environmental reservoir and transmission route remain unclear. Here, we attempt to clarify, by analyzing human exposome metagenomic data sets, whether humans are exposed to different Pneumocystis species present in the air but only P. jirovecii cells are able to replicate or whether they are selectively exposed to P. jirovecii Our analysis supports the latter hypothesis, which is consistent with a local transmission model. These data also suggest that healthy carriers are a major driver for the transmission.

    View details for DOI 10.1128/mBio.03138-19

    View details for PubMedID 32156824

  • Systematic Identification of Regulators of Oxidative Stress Reveals Non-canonical Roles for Peroxisomal Import and the Pentose Phosphate Pathway. Cell reports Dubreuil, M. M., Morgens, D. W., Okumoto, K., Honsho, M., Contrepois, K., Lee-McMullen, B., Traber, G. M., Sood, R. S., Dixon, S. J., Snyder, M. P., Fujiki, Y., Bassik, M. C. 2020; 30 (5): 1417

    Abstract

    Reactive oxygen species (ROS) play critical roles inmetabolism and disease, yet a comprehensive analysis of the cellular response to oxidative stress is lacking. To systematically identify regulators ofoxidative stress, we conducted genome-wide Cas9/CRISPR and shRNA screens. This revealed a detailed picture of diverse pathways that control oxidative stress response, ranging from the TCA cycle and DNA repair machineries to iron transport, trafficking, and metabolism. Paradoxically, disrupting the pentose phosphate pathway (PPP) at the level of phosphogluconate dehydrogenase (PGD) protects cells against ROS. This dramatically alters metabolites in the PPP, consistent with rewiring of upper glycolysis to promote antioxidant production. In addition, disruption of peroxisomal import unexpectedly increases resistance to oxidative stress by altering the localization of catalase. Together, these studies provide insights into the roles of peroxisomal matrix import and the PPP in redox biology and represent a rich resource for understanding the cellular response to oxidative stress.

    View details for DOI 10.1016/j.celrep.2020.01.013

    View details for PubMedID 32023459

  • The MEK5-ERK5 kinase axis controls lipid metabolism in small cell lung cancer. Cancer research Cristea, S., Coles, G. L., Hornburg, D., Gershkovitz, M., Arand, J., Cao, S., Sen, T., Williamson, S. C., Kim, J. W., Drainas, A. P., He, A., Le Cam, L., Byers, L. A., Snyder, M. P., Contrepois, K., Sage, J. 2020

    Abstract

    Small cell lung cancer (SCLC) is an aggressive form of lung cancer with dismal survival rates. While kinases often play key roles driving tumorigenesis, there are strikingly few kinases known to promote the development of SCLC. Here we investigated the contribution of the MAP kinase module MEK5/ERK5 to SCLC growth. MEK5 and ERK5 were required for optimal survival and expansion of SCLC cell lines in vitro and in vivo. Transcriptomics analyses identified a role for the MEK5-ERK5 axis in the metabolism of SCLC cells, including lipid metabolism. In-depth lipidomics analyses showed that loss of MEK5/ERK5 perturbs several lipid metabolism pathways, including the mevalonate pathway that controls cholesterol synthesis. Notably, depletion of MEK5/ERK5 sensitized SCLC cells to pharmacological inhibition of the mevalonate pathway by statins. These data identify a new MEK5-ERK5-lipid metabolism axis that promotes the growth of SCLC.

    View details for DOI 10.1158/0008-5472.CAN-19-1027

    View details for PubMedID 31969375

  • Multiomic immune clockworks of pregnancy. Seminars in immunopathology Peterson, L. S., Stelzer, I. A., Tsai, A. S., Ghaemi, M. S., Han, X. n., Ando, K. n., Winn, V. D., Martinez, N. R., Contrepois, K. n., Moufarrej, M. N., Quake, S. n., Relman, D. A., Snyder, M. P., Shaw, G. M., Stevenson, D. K., Wong, R. J., Arck, P. n., Angst, M. S., Aghaeepour, N. n., Gaudilliere, B. n. 2020

    Abstract

    Preterm birth is the leading cause of mortality in children under the age of five worldwide. Despite major efforts, we still lack the ability to accurately predict and effectively prevent preterm birth. While multiple factors contribute to preterm labor, dysregulations of immunological adaptations required for the maintenance of a healthy pregnancy is at its pathophysiological core. Consequently, a precise understanding of these chronologically paced immune adaptations and of the biological pacemakers that synchronize the pregnancy "immune clock" is a critical first step towards identifying deviations that are hallmarks of peterm birth. Here, we will review key elements of the fetal, placental, and maternal pacemakers that program the immune clock of pregnancy. We will then emphasize multiomic studies that enable a more integrated view of pregnancy-related immune adaptations. Such multiomic assessments can strengthen the biological plausibility of immunological findings and increase the power of biological signatures predictive of preterm birth.

    View details for DOI 10.1007/s00281-019-00772-1

    View details for PubMedID 32020337

  • Cumulative Lifetime Burden of Cardiovascular Disease From Early Exposure to Air Pollution. Journal of the American Heart Association Kim, J. B., Prunicki, M. n., Haddad, F. n., Dant, C. n., Sampath, V. n., Patel, R. n., Smith, E. n., Akdis, C. n., Balmes, J. n., Snyder, M. P., Wu, J. C., Nadeau, K. C. 2020; 9 (6): e014944

    Abstract

    The disease burden associated with air pollution continues to grow. The World Health Organization (WHO) estimates ≈7 million people worldwide die yearly from exposure to polluted air, half of which-3.3 million-are attributable to cardiovascular disease (CVD), greater than from major modifiable CVD risks including smoking, hypertension, hyperlipidemia, and diabetes mellitus. This serious and growing health threat is attributed to increasing urbanization of the world's populations with consequent exposure to polluted air. Especially vulnerable are the elderly, patients with pre-existing CVD, and children. The cumulative lifetime burden in children is particularly of concern because their rapidly developing cardiopulmonary systems are more susceptible to damage and they spend more time outdoors and therefore inhale more pollutants. World Health Organization estimates that 93% of the world's children aged <15 years-1.8 billion children-breathe air that puts their health and development at risk. Here, we present growing scientific evidence, including from our own group, that chronic exposure to air pollution early in life is directly linked to development of major CVD risks, including obesity, hypertension, and metabolic disorders. In this review, we surveyed the literature for current knowledge of how pollution exposure early in life adversely impacts cardiovascular phenotypes, and lay the foundation for early intervention and other strategies that can help prevent this damage. We also discuss the need for better guidelines and additional research to validate exposure metrics and interventions that will ultimately help healthcare providers reduce the growing burden of CVD from pollution.

    View details for DOI 10.1161/JAHA.119.014944

    View details for PubMedID 32174249

  • A limited set of transcriptional programs define major cell types. Genome research Breschi, A. n., Muñoz-Aguirre, M. n., Wucher, V. n., Davis, C. A., Garrido-Martín, D. n., Djebali, S. n., Gillis, J. n., Pervouchine, D. D., Vlasova, A. n., Dobin, A. n., Zaleski, C. n., Drenkow, J. n., Danyko, C. n., Scavelli, A. n., Reverter, F. n., Snyder, M. P., Gingeras, T. R., Guigó, R. n. 2020; 30 (7): 1047–59

    Abstract

    We have produced RNA sequencing data for 53 primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex, and found that departures from the normal cellular composition correlate with histological phenotypes associated with disease.

    View details for DOI 10.1101/gr.263186.120

    View details for PubMedID 32759341

  • Multiomics Characterization of Preterm Birth in Low- and Middle-Income Countries. JAMA network open Jehan, F. n., Sazawal, S. n., Baqui, A. H., Nisar, M. I., Dhingra, U. n., Khanam, R. n., Ilyas, M. n., Dutta, A. n., Mitra, D. K., Mehmood, U. n., Deb, S. n., Mahmud, A. n., Hotwani, A. n., Ali, S. M., Rahman, S. n., Nizar, A. n., Ame, S. M., Moin, M. I., Muhammad, S. n., Chauhan, A. n., Begum, N. n., Khan, W. n., Das, S. n., Ahmed, S. n., Hasan, T. n., Khalid, J. n., Rizvi, S. J., Juma, M. H., Chowdhury, N. H., Kabir, F. n., Aftab, F. n., Quaiyum, A. n., Manu, A. n., Yoshida, S. n., Bahl, R. n., Rahman, A. n., Pervin, J. n., Winston, J. n., Musonda, P. n., Stringer, J. S., Litch, J. A., Ghaemi, M. S., Moufarrej, M. N., Contrepois, K. n., Chen, S. n., Stelzer, I. A., Stanley, N. n., Chang, A. L., Hammad, G. B., Wong, R. J., Liu, C. n., Quaintance, C. C., Culos, A. n., Espinosa, C. n., Xenochristou, M. n., Becker, M. n., Fallahzadeh, R. n., Ganio, E. n., Tsai, A. S., Gaudilliere, D. n., Tsai, E. S., Han, X. n., Ando, K. n., Tingle, M. n., Maric, I. n., Wise, P. H., Winn, V. D., Druzin, M. L., Gibbs, R. S., Darmstadt, G. L., Murray, J. C., Shaw, G. M., Stevenson, D. K., Snyder, M. P., Quake, S. R., Angst, M. S., Gaudilliere, B. n., Aghaeepour, N. n. 2020; 3 (12): e2029655

    Abstract

    Worldwide, preterm birth (PTB) is the single largest cause of deaths in the perinatal and neonatal period and is associated with increased morbidity in young children. The cause of PTB is multifactorial, and the development of generalizable biological models may enable early detection and guide therapeutic studies.To investigate the ability of transcriptomics and proteomics profiling of plasma and metabolomics analysis of urine to identify early biological measurements associated with PTB.This diagnostic/prognostic study analyzed plasma and urine samples collected from May 2014 to June 2017 from pregnant women in 5 biorepository cohorts in low- and middle-income countries (LMICs; ie, Matlab, Bangladesh; Lusaka, Zambia; Sylhet, Bangladesh; Karachi, Pakistan; and Pemba, Tanzania). These cohorts were established to study maternal and fetal outcomes and were supported by the Alliance for Maternal and Newborn Health Improvement and the Global Alliance to Prevent Prematurity and Stillbirth biorepositories. Data were analyzed from December 2018 to July 2019.Blood and urine specimens that were collected early during pregnancy (median sampling time of 13.6 weeks of gestation, according to ultrasonography) were processed, stored, and shipped to the laboratories under uniform protocols. Plasma samples were assayed for targeted measurement of proteins and untargeted cell-free ribonucleic acid profiling; urine samples were assayed for metabolites.The PTB phenotype was defined as the delivery of a live infant before completing 37 weeks of gestation.Of the 81 pregnant women included in this study, 39 had PTBs (48.1%) and 42 had term pregnancies (51.9%) (mean [SD] age of 24.8 [5.3] years). Univariate analysis demonstrated functional biological differences across the 5 cohorts. A cohort-adjusted machine learning algorithm was applied to each biological data set, and then a higher-level machine learning modeling combined the results into a final integrative model. The integrated model was more accurate, with an area under the receiver operating characteristic curve (AUROC) of 0.83 (95% CI, 0.72-0.91) compared with the models derived for each independent biological modality (transcriptomics AUROC, 0.73 [95% CI, 0.61-0.83]; metabolomics AUROC, 0.59 [95% CI, 0.47-0.72]; and proteomics AUROC, 0.75 [95% CI, 0.64-0.85]). Primary features associated with PTB included an inflammatory module as well as a metabolomic module measured in urine associated with the glutamine and glutamate metabolism and valine, leucine, and isoleucine biosynthesis pathways.This study found that, in LMICs and high PTB settings, major biological adaptations during term pregnancy follow a generalizable model and the predictive accuracy for PTB was augmented by combining various omics data sets, suggesting that PTB is a condition that manifests within multiple biological systems. These data sets, with machine learning partnerships, may be a key step in developing valuable predictive tests and intervention candidates for preventing PTB.

    View details for DOI 10.1001/jamanetworkopen.2020.29655

    View details for PubMedID 33337494

  • Remodeling of active endothelial enhancers is associated with aberrant gene-regulatory networks in pulmonary arterial hypertension. Nature communications Reyes-Palomares, A. n., Gu, M. n., Grubert, F. n., Berest, I. n., Sa, S. n., Kasowski, M. n., Arnold, C. n., Shuai, M. n., Srivas, R. n., Miao, S. n., Li, D. n., Snyder, M. P., Rabinovitch, M. n., Zaugg, J. B. 2020; 11 (1): 1673

    Abstract

    Environmental and epigenetic factors often play an important role in polygenic disorders. However, how such factors affect disease-specific tissues at the molecular level remains to be understood. Here, we address this in pulmonary arterial hypertension (PAH). We obtain pulmonary arterial endothelial cells (PAECs) from lungs of patients and controls (n = 19), and perform chromatin, transcriptomic and interaction profiling. Overall, we observe extensive remodeling at active enhancers in PAH PAECs and identify hundreds of differentially active TFs, yet find very little transcriptomic changes in steady-state. We devise a disease-specific enhancer-gene regulatory network and predict that primed enhancers in PAH PAECs are activated by the differentially active TFs, resulting in an aberrant response to endothelial signals, which could lead to disturbed angiogenesis and endothelial-to-mesenchymal-transition. We validate these predictions for a selection of target genes in PAECs stimulated with TGF-β, VEGF or serotonin. Our study highlights the role of chromatin state and enhancers in disease-relevant cell types of PAH.

    View details for DOI 10.1038/s41467-020-15463-x

    View details for PubMedID 32245974

  • Classifying non-small cell lung cancer types and transcriptomic subtypes using convolutional neural networks. Journal of the American Medical Informatics Association : JAMIA Yu, K. H., Wang, F. n., Berry, G. J., Ré, C. n., Altman, R. B., Snyder, M. n., Kohane, I. S. 2020; 27 (5): 757–69

    Abstract

    Non-small cell lung cancer is a leading cause of cancer death worldwide, and histopathological evaluation plays the primary role in its diagnosis. However, the morphological patterns associated with the molecular subtypes have not been systematically studied. To bridge this gap, we developed a quantitative histopathology analytic framework to identify the types and gene expression subtypes of non-small cell lung cancer objectively.We processed whole-slide histopathology images of lung adenocarcinoma (n = 427) and lung squamous cell carcinoma patients (n = 457) in the Cancer Genome Atlas. We built convolutional neural networks to classify histopathology images, evaluated their performance by the areas under the receiver-operating characteristic curves (AUCs), and validated the results in an independent cohort (n = 125).To establish neural networks for quantitative image analyses, we first built convolutional neural network models to identify tumor regions from adjacent dense benign tissues (AUCs > 0.935) and recapitulated expert pathologists' diagnosis (AUCs > 0.877), with the results validated in an independent cohort (AUCs = 0.726-0.864). We further demonstrated that quantitative histopathology morphology features identified the major transcriptomic subtypes of both adenocarcinoma and squamous cell carcinoma (P < .01).Our study is the first to classify the transcriptomic subtypes of non-small cell lung cancer using fully automated machine learning methods. Our approach does not rely on prior pathology knowledge and can discover novel clinically relevant histopathology patterns objectively. The developed procedure is generalizable to other tumor types or diseases.

    View details for DOI 10.1093/jamia/ocz230

    View details for PubMedID 32364237

  • Long-read assays shed new light on the transcriptome complexity of a viral pathogen. Scientific reports Tombácz, D. n., Prazsák, I. n., Csabai, Z. n., Moldován, N. n., Dénes, B. n., Snyder, M. n., Boldogkői, Z. n. 2020; 10 (1): 13822

    Abstract

    Characterization of global transcriptomes using conventional short-read sequencing is challenging due to the insensitivity of these platforms to transcripts isoforms, multigenic RNA molecules, and transcriptional overlaps. Long-read sequencing (LRS) can overcome these limitations by reading full-length transcripts. Employment of these technologies has led to the redefinition of transcriptional complexities in reported organisms. In this study, we applied LRS platforms from Pacific Biosciences and Oxford Nanopore Technologies to profile the vaccinia virus (VACV) transcriptome. We performed cDNA and direct RNA sequencing analyses and revealed an extremely complex transcriptional landscape of this virus. In particular, VACV genes produce large numbers of transcript isoforms that vary in their start and termination sites. A significant fraction of VACV transcripts start or end within coding regions of neighbouring genes. This study provides new insights into the transcriptomic profile of this viral pathogen.

    View details for DOI 10.1038/s41598-020-70794-5

    View details for PubMedID 32796917

  • Research on the Human Proteome Reaches a Major Milestone: >90% of Predicted Human Proteins Now Credibly Detected, According to the HUPO Human Proteome Project. Journal of proteome research Omenn, G. S., Lane, L. n., Overall, C. M., Cristea, I. M., Corrales, F. J., Lindskog, C. n., Paik, Y. K., Van Eyk, J. E., Liu, S. n., Pennington, S. R., Snyder, M. P., Baker, M. S., Bandeira, N. n., Aebersold, R. n., Moritz, R. L., Deutsch, E. W. 2020

    Abstract

    According to the 2020 Metrics of the HUPO Human Proteome Project (HPP), expression has now been detected at the protein level for >90% of the 19 773 predicted proteins coded in the human genome. The HPP annually reports on progress made throughout the world toward credibly identifying and characterizing the complete human protein parts list and promoting proteomics as an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2020-01 classified 17 874 proteins as PE1, having strong protein-level evidence, up 180 from 17 694 one year earlier. These represent 90.4% of the 19 773 predicted coding genes (all PE1,2,3,4 proteins in neXtProt). Conversely, the number of neXtProt PE2,3,4 proteins, termed the "missing proteins" (MPs), was reduced by 230 from 2129 to 1899 since the neXtProt 2019-01 release. PeptideAtlas is the primary source of uniform reanalysis of raw mass spectrometry data for neXtProt, supplemented this year with extensive data from MassIVE. PeptideAtlas 2020-01 added 362 canonical proteins between 2019 and 2020 and MassIVE contributed 84 more, many of which converted PE1 entries based on non-MS evidence to the MS-based subgroup. The 19 Biology and Disease-driven B/D-HPP teams continue to pursue the identification of driver proteins that underlie disease states, the characterization of regulatory mechanisms controlling the functions of these proteins, their proteoforms, and their interactions, and the progression of transitions from correlation to coexpression to causal networks after system perturbations. And the Human Protein Atlas published Blood, Brain, and Metabolic Atlases.

    View details for DOI 10.1021/acs.jproteome.0c00485

    View details for PubMedID 32931287

  • A high-stringency blueprint of the human proteome. Nature communications Adhikari, S., Nice, E. C., Deutsch, E. W., Lane, L., Omenn, G. S., Pennington, S. R., Paik, Y., Overall, C. M., Corrales, F. J., Cristea, I. M., Van Eyk, J. E., Uhlen, M., Lindskog, C., Chan, D. W., Bairoch, A., Waddington, J. C., Justice, J. L., LaBaer, J., Rodriguez, H., He, F., Kostrzewa, M., Ping, P., Gundry, R. L., Stewart, P., Srivastava, S., Srivastava, S., Nogueira, F. C., Domont, G. B., Vandenbrouck, Y., Lam, M. P., Wennersten, S., Vizcaino, J. A., Wilkins, M., Schwenk, J. M., Lundberg, E., Bandeira, N., Marko-Varga, G., Weintraub, S. T., Pineau, C., Kusebauch, U., Moritz, R. L., Ahn, S. B., Palmblad, M., Snyder, M. P., Aebersold, R., Baker, M. S. 2020; 11 (1): 5301

    Abstract

    The Human Proteome Organization (HUPO) launched the Human Proteome Project (HPP) in 2010, creating an international framework for global collaboration, data sharing, quality assurance and enhancing accurate annotation of the genome-encoded proteome. During the subsequent decade, the HPP established collaborations, developed guidelines and metrics, and undertook reanalysis of previously deposited community data, continuously increasing the coverage of the human proteome. On the occasion of the HPP's tenth anniversary, we here report a 90.4% complete high-stringency human proteome blueprint. This knowledge is essential for discerning molecular processes in health and disease, as we demonstrate by highlighting potential roles the human proteome plays in our understanding, diagnosis and treatment of cancers, cardiovascular and infectious diseases.

    View details for DOI 10.1038/s41467-020-19045-9

    View details for PubMedID 33067450

  • iPSC Modeling of RBM20-Deficient DCM Identifies Upregulation of RBM20 as a Therapeutic Strategy. Cell reports Briganti, F. n., Sun, H. n., Wei, W. n., Wu, J. n., Zhu, C. n., Liss, M. n., Karakikes, I. n., Rego, S. n., Cipriano, A. n., Snyder, M. n., Meder, B. n., Xu, Z. n., Millat, G. n., Gotthardt, M. n., Mercola, M. n., Steinmetz, L. M. 2020; 32 (10): 108117

    Abstract

    Recent advances in induced pluripotent stem cell (iPSC) technology and directed differentiation of iPSCs into cardiomyocytes (iPSC-CMs) make it possible to model genetic heart disease in vitro. We apply CRISPR/Cas9 genome editing technology to introduce three RBM20 mutations in iPSCs and differentiate them into iPSC-CMs to establish an in vitro model of RBM20 mutant dilated cardiomyopathy (DCM). In iPSC-CMs harboring a known causal RBM20 variant, the splicing of RBM20 target genes, calcium handling, and contractility are impaired consistent with the disease manifestation in patients. A variant (Pro633Leu) identified by exome sequencing of patient genomes displays the same disease phenotypes, thus establishing this variant as disease causing. We find that all-trans retinoic acid upregulates RBM20 expression and reverts the splicing, calcium handling, and contractility defects in iPSC-CMs with different causal RBM20 mutations. These results suggest that pharmacological upregulation of RBM20 expression is a promising therapeutic strategy for DCM patients with a heterozygous mutation in RBM20.

    View details for DOI 10.1016/j.celrep.2020.108117

    View details for PubMedID 32905764

  • RobNorm: Model-Based Robust Normalization Method for Labeled Quantitative Mass Spectrometry Proteomics Data. Bioinformatics (Oxford, England) Wang, M. n., Jiang, L. n., Jian, R. n., Chan, J. Y., Liu, Q. n., Snyder, M. P., Tang, H. n. 2020

    Abstract

    Data normalization is an important step in processing proteomics data generated in mass spectrometry (MS) experiments, which aims to reduce sample-level variation and facilitate comparisons of samples. Previously published methods for normalization primarily depend on the assumption that the distribution of protein expression is similar across all samples. However, this assumption fails when the protein expression data is generated from heterogenous samples, such as from various tissue types. This led us to develop a novel data-driven method for improved normalization to correct the systematic bias meanwhile maintaining underlying biological heterogeneity.To robustly correct the systematic bias, we used the density-power-weight method to down-weigh outliers and extended the one-dimensional robust fitting method described in the previous work of (Windham, 1995, Fujisawa and Eguchi, 2008) to our structured data. We then constructed a robustness criterion and developed a new normalization algorithm, called RobNorm.In simulation studies and analysis of real data from the genotype-tissue expression (GTEx) project, we compared and evaluated the performance of RobNorm against other normalization methods. We found that the RobNorm approach exhibits the greatest reduction in systematic bias while maintaining across-tissue variation, especially for datasets from highly heterogeneous samples.https://github.com/mwgrassgreen/RobNorm.

    View details for DOI 10.1093/bioinformatics/btaa904

    View details for PubMedID 33098413

  • Longitudinal Analysis of Serum Cytokine Levels and Gut Microbial Abundance Links IL-17/IL-22 with Clostridia and Insulin Sensitivity in Humans. Diabetes Zhou, X. n., Johnson, J. S., Spakowicz, D. n., Zhou, W. n., Zhou, Y. n., Sodergren, E. n., Snyder, M. n., Weinstock, G. M. 2020

    Abstract

    Recent studies using mouse models suggest that interaction between the gut microbiome and IL-17/IL-22 producing cells plays a role in the development of metabolic diseases. We investigated this relationship in humans using data from the prediabetes study of the Integrated Human Microbiome Project (iHMP). Specifically, we addressed the hypothesis that early in the onset of metabolic diseases there is a decline in serum levels of IL-17/IL-22, with concomitant changes in the gut microbiome. Clustering iHMP study participants on the basis of longitudinal IL-17/IL-22 profiles identified discrete groups. Individuals distinguished by low levels of IL-17/IL-22 were linked to established markers of metabolic disease, including insulin sensitivity. These individuals also displayed gut microbiome dysbiosis, characterized by decreased diversity, and IL-17/IL-22-related declines in the phylum Firmicutes, class Clostridia, and order Clostridiales. This ancillary analysis of the iHMP data therefore supports a link between the gut microbiome, IL-17/IL-22 and the onset of metabolic diseases. This raises the possibility for novel, microbiome-related therapeutic targets that may effectively alleviate metabolic diseases in humans as they do in animal models.

    View details for DOI 10.2337/db19-0592

    View details for PubMedID 32366680

  • Immunologic effects of forest fire exposure show increases in IL-1β and CRP. Allergy Prunicki, M. M., Dant, C. C., Cao, S. n., Maecker, H. n., Haddad, F. n., Kim, J. B., Snyder, M. n., Wu, J. n., Nadeau, K. n. 2020

    View details for DOI 10.1111/all.14251

    View details for PubMedID 32112439

  • Human-engineered Treg-like cells suppress FOXP3-deficient T cells but preserve adaptive immune responses in vivo. Clinical & translational immunology Sato, Y. n., Passerini, L. n., Piening, B. D., Uyeda, M. J., Goodwin, M. n., Gregori, S. n., Snyder, M. P., Bertaina, A. n., Roncarolo, M. G., Bacchetta, R. n. 2020; 9 (11): e1214

    Abstract

    Genetic or acquired defects in FOXP3+ regulatory T cells (Tregs) play a key role in many immune-mediated diseases including immune dysregulation polyendocrinopathy, enteropathy, X-linked (IPEX) syndrome. Previously, we demonstrated CD4+ T cells from healthy donors and IPEX patients can be converted into functional Treg-like cells by lentiviral transfer of FOXP3 (CD4LVFOXP3). These CD4LVFOXP3 cells have potent regulatory function, suggesting their potential as an innovative therapeutic. Here, we present molecular and preclinical in vivo data supporting CD4LVFOXP3 cell clinical progression.The molecular characterisation of CD4LVFOXP3 cells included flow cytometry, qPCR, RNA-seq and TCR-seq. The in vivo suppressive function of CD4LVFOXP3 cells was assessed in xenograft-versus-host disease (xeno-GvHD) and FOXP3-deficient IPEX-like humanised mouse models. The safety of CD4LVFOXP3 cells was evaluated using peripheral blood (PB) humanised (hu)- mice testing their impact on immune response against pathogens, and immune surveillance against tumor antigens.We demonstrate that the conversion of CD4+ T cells to CD4LVFOXP3 cells leads to specific transcriptional changes as compared to CD4+ T-cell transduction in the absence of FOXP3, including upregulation of Treg-related genes. Furthermore, we observe specific preservation of a polyclonal TCR repertoire during in vitro cell production. Both allogeneic and autologous CD4LVFOXP3 cells protect from xeno-GvHD after two sequential infusions of effector T cells. CD4LVFOXP3 cells prevent hyper-proliferation of CD4+ memory T cells in the FOXP3-deficient IPEX-like hu-mice. CD4LVFOXP3 cells do not impede in vivo expansion of antigen-primed T cells or tumor clearance in the PB hu-mice.These data support the clinical readiness of CD4LVFOXP3 cells to treat IPEX syndrome and other immune-mediated diseases caused by insufficient or dysfunctional FOXP3+ Tregs.

    View details for DOI 10.1002/cti2.1214

    View details for PubMedID 33304583

    View details for PubMedCentralID PMC7688376

  • Meta-analytic approach for transcriptome profiling of herpes simplex virus type 1. Scientific data Tombácz, D. n., Torma, G. n., Gulyás, G. n., Moldován, N. n., Snyder, M. n., Boldogkői, Z. n. 2020; 7 (1): 223

    Abstract

    In this meta-analysis, we re-analysed and compared herpes simplex virus type 1 transcriptomic data generated by eight studies using various short- and long-read sequencing techniques and different library preparation methods. We identified a large number of novel mRNAs, non-coding RNAs and transcript isoforms, and validated many previously published transcripts. Here, we present the most complete HSV-1 transcriptome to date. Furthermore, we also demonstrate that various sequencing techniques, including both cDNA and direct RNA sequencing approaches, are error-prone, which can be circumvented by using integrated approaches. This work draws attention to the need for using multiple sequencing approaches and meta-analyses in transcriptome profiling studies to obtain reliable results.

    View details for DOI 10.1038/s41597-020-0558-8

    View details for PubMedID 32647284

  • Chromosome-level de novo assembly of the pig-tailed macaque genome using linked-read sequencing and HiC proximity scaffolding. GigaScience Roodgar, M. n., Babveyh, A. n., Nguyen, L. H., Zhou, W. n., Sinha, R. n., Lee, H. n., Hanks, J. B., Avula, M. n., Jiang, L. n., Jian, R. n., Lee, H. n., Song, G. n., Chaib, H. n., Weissman, I. L., Batzoglou, S. n., Holmes, S. n., Smith, D. G., Mankowski, J. L., Prost, S. n., Snyder, M. P. 2020; 9 (7)

    Abstract

    Macaque species share >93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g., HIV and neurodegenerative diseases). However, the quality of genome assembly and annotation for several macaque species lags behind the human genome effort.To close this gap and enhance functional genomics approaches, we used a combination of de novo linked-read assembly and scaffolding using proximity ligation assay (HiC) to assemble the pig-tailed macaque (Macaca nemestrina) genome. This combinatorial method yielded large scaffolds at chromosome level with a scaffold N50 of 127.5 Mb; the 23 largest scaffolds covered 90% of the entire genome. This assembly revealed large-scale rearrangements between pig-tailed macaque chromosomes 7, 12, and 13 and human chromosomes 2, 14, and 15. We subsequently annotated the genome using transcriptome and proteomics data from personalized induced pluripotent stem cells derived from the same animal. Reconstruction of the evolutionary tree using whole-genome annotation and orthologous comparisons among 3 macaque species, human, and mouse genomes revealed extensive homology between human and pig-tailed macaques with regards to both pluripotent stem cell genes and innate immune gene pathways. Our results confirm that rhesus and cynomolgus macaques exhibit a closer evolutionary distance to each other than either species exhibits to humans or pig-tailed macaques.These findings demonstrate that pig-tailed macaques can serve as an excellent animal model for the study of many human diseases particularly with regards to pluripotency and innate immune pathways.

    View details for DOI 10.1093/gigascience/giaa069

    View details for PubMedID 32649757

  • PPARγ-p53-Mediated Vasculoregenerative Program to Reverse Pulmonary Hypertension. Circulation research Hennigs, J. K., Cao, A. n., Li, C. G., Shi, M. n., Mienert, J. n., Miyagawa, K. n., Körbelin, J. n., Marciano, D. P., Chen, P. I., Roughley, M. n., Elliott, M. V., Harper, R. L., Bill, M. A., Chappell, J. n., Moonen, J. R., Diebold, I. n., Wang, L. n., Snyder, M. P., Rabinovitch, M. n. 2020

    Abstract

    Rationale: In pulmonary arterial hypertension (PAH), endothelial dysfunction and obliterative vascular disease are associated with DNA damage and impaired signaling of bone morphogenetic protein type 2 receptor (BMPR2) via two downstream transcription factors, PPARγ and p53. Objective: We investigated the vasculoprotective and regenerative potential of a newly identified PPARγ- p53 transcription factor complex in the pulmonary endothelium. Methods and Results: In this study, we identified a pharmacologically inducible vasculoprotective mechanism in pulmonary arterial (PA) and lung microvascular (MV) endothelial cells (EC) in response to DNA damage and oxidant stress regulated in part by a BMPR2 dependent transcription factor complex between PPARγ and p53. Chromatin immunoprecipitation (ChIP) sequencing (seq) and RNA-seq established an inducible PPARγ-p53 mediated regenerative program regulating 19 genes involved in lung EC survival, angiogenesis and DNA repair including, EPHA2, FHL2, JAG1, SULF2 and TIGAR. Expression of these genes was partially impaired when the PPARγ-p53 complex was pharmacologically disrupted or when BMPR2 was reduced in PAEC subjected to oxidative stress. In EC-Bmpr2-knockout mice unable to stabilize p53 in ECs under oxidative stress, Nutlin-3 rescued endothelial p53 and PPARγ-p53 complex formation and induced target genes, such as APLN and JAG1, to regenerate pulmonary microvessels and reverse pulmonary hypertension. In PAEC from BMPR2 mutant PAH patients, pharmacological induction of p53 and PPARγ-p53 genes repaired damaged DNA utilizing genes from the nucleotide excision repair pathway without provoking PAEC apoptosis. Conclusions: We identified a novel therapeutic strategy that activates a vasculoprotective gene regulation program in PAEC downstream of dysfunctional BMPR2 to rehabilitate PAH PAEC, regenerate pulmonary microvessels and reverse disease. Our studies pave the way for p53-based vasculoregenerative therapies for PAH by extending the therapeutic focus to PAEC dysfunction and to DNA damage associated with PAH progression.

    View details for DOI 10.1161/CIRCRESAHA.119.316339

    View details for PubMedID 33322916

  • Obesity Drives Delayed Infarct Expansion, Inflammation, and Distinct Gene Networks in a Mouse Stroke Model. Translational stroke research Peterson, T. C., Lechtenberg, K. J., Piening, B. D., Lucas, T. A., Wei, E. n., Chaib, H. n., Dowdell, A. K., Snyder, M. n., Buckwalter, M. S. 2020

    Abstract

    Obesity is associated with chronic peripheral inflammation, is a risk factor for stroke, and causes increased infarct sizes. To characterize how obesity increases infarct size, we fed a high-fat diet to wild-type C57BL/6J mice for either 6 weeks or 15 weeks and then induced distal middle cerebral artery strokes. We found that infarct expansion happened late after stroke. There were no differences in cortical neuroinflammation (astrogliosis, microgliosis, or pro-inflammatory cytokines) either prior to or 10 h after stroke, and also no differences in stroke size at 10 h. However, by 3 days after stroke, animals fed a high-fat diet had a dramatic increase in microgliosis and astrogliosis that was associated with larger strokes and worsened functional recovery. RNA sequencing revealed a dramatic increase in inflammatory genes in the high-fat diet-fed animals 3 days after stroke that were not present prior to stroke. Genetic pathways unique to diet-induced obesity were primarily related to adaptive immunity, extracellular matrix components, cell migration, and vasculogenesis. The late appearance of neuroinflammation and infarct expansion indicates that there may be a therapeutic window between 10 and 36 h after stroke where inflammation and obesity-specific transcriptional programs could be targeted to improve outcomes in people with obesity and stroke.

    View details for DOI 10.1007/s12975-020-00826-9

    View details for PubMedID 32588199

  • Template-switching artifacts resemble alternative polyadenylation. BMC genomics Balazs, Z., Tombacz, D., Csabai, Z., Moldovan, N., Snyder, M., Boldogkoi, Z. 2019; 20 (1): 824

    Abstract

    BACKGROUND: Alternative polyadenylation is commonly examined using cDNA sequencing, which is known to be affected by template-switching artifacts. However, the effects of such template-switching artifacts on alternative polyadenylation are generally disregarded, while alternative polyadenylation artifacts are attributed to internal priming.RESULTS: Here, we analyzed both long-read cDNA sequencing and direct RNA sequencing data of two organisms, generated by different sequencing platforms. We developed a filtering algorithm which takes into consideration that template-switching can be a source of artifactual polyadenylation when filtering out spurious polyadenylation sites. The algorithm outperformed the conventional internal priming filters based on comparison to direct RNA sequencing data. We also showed that the polyadenylation artifacts arise in cDNA sequencing at consecutive stretches of as few as three adenines. There was no substantial difference between the lengths of poly(A) tails at the artifactual and the true transcriptional end sites even though it is expected that internal priming artifacts have shorter poly(A) tails than genuine polyadenylated reads.CONCLUSIONS: Our findings suggest that template switching plays an important role in the generation of spurious polyadenylation and support the need for more rigorous filtering of artifactual polyadenylation sites in cDNA data, or that alternative polyadenylation should be annotated using native RNA sequencing.

    View details for DOI 10.1186/s12864-019-6199-7

    View details for PubMedID 31703623

  • Big data and health LANCET DIGITAL HEALTH Snyder, M., Zhou, W. 2019; 1 (6): E252–E254
  • Genome-wide effects of social status on DNA methylation in the brain of a cichlid fish, Astatotilapia burtoni. BMC genomics Hilliard, A. T., Xie, D., Ma, Z., Snyder, M. P., Fernald, R. D. 2019; 20 (1): 699

    Abstract

    BACKGROUND: Successful social behavior requires real-time integration of information about the environment, internal physiology, and past experience. The molecular substrates of this integration are poorly understood, but likely modulate neural plasticity and gene regulation. In the cichlid fish species Astatotilapia burtoni, male social status can shift rapidly depending on the environment, causing fast behavioral modifications and a cascade of changes in gene transcription, the brain, and the reproductive system. These changes can be permanent but are also reversible, implying the involvement of a robust but flexible mechanism that regulates plasticity based on internal and external conditions. One candidate mechanism is DNA methylation, which has been linked to social behavior in many species, including A. burtoni. But, the extent of its effects after A. burtoni social change were previously unknown.RESULTS: We performed the first genome-wide search for DNA methylation patterns associated with social status in the brains of male A. burtoni, identifying hundreds of Differentially Methylated genomic Regions (DMRs) in dominant versus non-dominant fish. Most DMRs were inside genes supporting neural development, synapse function, and other processes relevant to neural plasticity, and DMRs could affect gene expression in multiple ways. DMR genes were more likely to be transcription factors, have a duplicate elsewhere in the genome, have an anti-sense lncRNA, and have more splice variants than other genes. Dozens of genes had multiple DMRs that were often seemingly positioned to regulate specific splice variants.CONCLUSIONS: Our results revealed genome-wide effects of A. burtoni social status on DNA methylation in the brain and strongly suggest a role for methylation in modulating plasticity across multiple biological levels. They also suggest many novel hypotheses to address in mechanistic follow-up studies, and will be a rich resource for identifying the relationships between behavioral, neural, and transcriptional plasticity in the context of social status.

    View details for DOI 10.1186/s12864-019-6047-9

    View details for PubMedID 31506062

  • Systematic Identification of Host Cell Regulators of Legionella pneumophila Pathogenesis Using a Genome-wide CRISPR Screen. Cell host & microbe Jeng, E. E., Bhadkamkar, V., Ibe, N. U., Gause, H., Jiang, L., Chan, J., Jian, R., Jimenez-Morales, D., Stevenson, E., Krogan, N. J., Swaney, D. L., Snyder, M. P., Mukherjee, S., Bassik, M. C. 2019

    Abstract

    During infection, Legionella pneumophila translocates over 300 effector proteins into the host cytosol, allowing the pathogen to establish an endoplasmic reticulum (ER)-like Legionella-containing vacuole (LCV) that supports bacterial replication. Here, we perform a genome-wide CRISPR-Cas9 screen and secondary targeted screens in U937 human monocyte/macrophage-like cells to systematically identify host factorsthat regulate killing by L.pneumophila. The screens reveal known host factors hijacked by L.pneumophila, as well as genes spanning diverse trafficking and signaling pathways previously not linked to L.pneumophila pathogenesis. We further characterize C1orf43 and KIAA1109 as regulators ofphagocytosis and show that RAB10 and its chaperone RABIF are required for optimal L.pneumophila replication and ER recruitment to the LCV. Finally, we show that Rab10 protein is recruited to the LCV and ubiquitinated by the effectors SidC/SdcA. Collectively, our results provide a wealth of previously undescribed insights into L.pneumophila pathogenesis and mammalian cell function.

    View details for DOI 10.1016/j.chom.2019.08.017

    View details for PubMedID 31540829

  • Large-Scale Analyses of Human Microbiomes Reveal Thousands of Small, Novel Genes. Cell Sberro, H., Fremin, B. J., Zlitni, S., Edfors, F., Greenfield, N., Snyder, M. P., Pavlopoulos, G. A., Kyrpides, N. C., Bhatt, A. S. 2019

    Abstract

    Small proteins are traditionally overlooked due to computational and experimental difficulties in detecting them. To systematically identify small proteins, we carried out a comparative genomics study on 1,773 human-associated metagenomes from four different body sites. We describe >4,000 conserved protein families, the majority of which are novel; 30% of these protein families are predicted to be secreted or transmembrane. Over 90% of the small protein families have no known domain and almost half are not represented in reference genomes. We identify putative housekeeping, mammalian-specific, defense-related, and protein families that are likely to be horizontally transferred. We provide evidence of transcription and translation for a subset of these families. Our study suggests that small proteins are highly abundant and those of the human microbiome, in particular, may perform diverse functions that have not been previously reported.

    View details for DOI 10.1016/j.cell.2019.07.016

    View details for PubMedID 31402174

  • Simultaneous RNA purification and size selection using on-chip isotachophoresis with an ionic spacer. Lab on a chip Han, C. M., Catoe, D., Munro, S. A., Khnouf, R., Snyder, M. P., Santiago, J. G., Salit, M. L., Cenik, C. 2019

    Abstract

    We present an on-chip method for the extraction of RNA within a specific size range from low-abundance samples. We use isotachophoresis (ITP) with an ionic spacer and a sieving matrix to enable size-selection with a high yield of RNA in the target size range. The spacer zone separates two concentrated ITP peaks, the first containing unwanted single nucleotides and the second focusing RNA of the target size range (2-35 nt). Our ITP method excludes >90% of single nucleotides and >65% of longer RNAs (>35 nt). Compared to size selection using gel electrophoresis, ITP-based size-selection yields a 2.2-fold increase in the amount of extracted RNAs within the target size range. We also demonstrate compatibility of the ITP-based size-selection with downstream next generation sequencing. On-chip ITP-prepared samples reveal higher reproducibility of transcript-specific measurements compared to samples size-selected by gel electrophoresis. Our method offers an attractive alternative to conventional sample preparation for sequencing with shorter assay time, higher extraction efficiency and reproducibility. Potential applications of ITP-based size-selection include sequencing-based analyses of small RNAs from low-abundance samples such as rare cell types, samples from fluorescence activated cell sorting (FACS), or limited clinical samples.

    View details for DOI 10.1039/c9lc00311h

    View details for PubMedID 31328753

  • MISTERMINATE Mechanistically Links Mitochondrial Dysfunction with Proteostasis Failure. Molecular cell Wu, Z., Tantray, I., Lim, J., Chen, S., Li, Y., Davis, Z., Sitron, C., Dong, J., Gispert, S., Auburger, G., Brandman, O., Bi, X., Snyder, M., Lu, B. 2019

    Abstract

    Mitochondrial dysfunction and proteostasis failure frequently coexist as hallmarks of neurodegenerative disease. How these pathologies are related is notwell understood. Here, we describe a phenomenon termed MISTERMINATE (mitochondrial-stress-induced translational termination impairment and protein carboxyl terminal extension), which mechanistically links mitochondrial dysfunction with proteostasis failure. We show that mitochondrial dysfunction impairs translational termination of nuclear-encoded mitochondrial mRNAs, including complex-I 30kD subunit (C-I30) mRNA, occurring on the mitochondrial surface in Drosophila and mammalian cells. Ribosomes stalled at the normal stop codon continue to add to the C terminus of C-I30 certain amino acids non-coded by mRNA template. C-terminally extended C-I30 is toxic when assembled into C-I and forms aggregates in the cytosol. Enhancing co-translational quality control prevents C-I30 C-terminal extension and rescues mitochondrial and neuromuscular degeneration in a Parkinson's disease model. These findings emphasize theimportance of efficient translation termination and reveal unexpected link between mitochondrial health and proteome homeostasis mediated by MISTERMINATE.

    View details for DOI 10.1016/j.molcel.2019.06.031

    View details for PubMedID 31378462

  • Matrix stiffness induces a tumorigenic phenotype in mammary epithelium through changes in chromatin accessibility. Nature biomedical engineering Stowers, R. S., Shcherbina, A., Israeli, J., Gruber, J. J., Chang, J., Nam, S., Rabiee, A., Teruel, M. N., Snyder, M. P., Kundaje, A., Chaudhuri, O. 2019

    Abstract

    In breast cancer, the increased stiffness of the extracellular matrix is a key driver of malignancy. Yet little is known about the epigenomic changes that underlie the tumorigenic impact of extracellular matrix mechanics. Here, we show in a three-dimensional culture model of breast cancer that stiff extracellular matrix induces a tumorigenic phenotype through changes in chromatin state. We found that increased stiffness yielded cells with more wrinkled nuclei and with increased lamina-associated chromatin, that cells cultured in stiff matrices displayed more accessible chromatin sites, which exhibited footprints of Sp1 binding, and that this transcription factor acts along with the histone deacetylases 3 and 8 to regulate the induction of stiffness-mediated tumorigenicity. Just as cell culture on soft environments or in them rather than on tissue-culture plastic better recapitulates the acinar morphology observed in mammary epithelium in vivo, mammary epithelial cells cultured on soft microenvironments or in them also more closely replicate the in vivo chromatin state. Our results emphasize the importance of culture conditions for epigenomic studies, and reveal that chromatin state is a critical mediator of mechanotransduction.

    View details for DOI 10.1038/s41551-019-0420-5

    View details for PubMedID 31285581

  • Long-Read Sequencing - A Powerful Toll in Viral Transcriptome Research TRENDS IN MICROBIOLOGY Boldogkoi, Z., Moldovan, N., Balazs, Z., Snyder, M., Tombacz, D. 2019; 27 (7): 578–92
  • Comment on 'AIRE-deficient patients harbor unique high-affinity disease-ameliorating autoantibodies'. eLife Landegren, N., Rosen, L. B., Freyhult, E., Eriksson, D., Fall, T., Smith, G., Ferre, E. M., Brodin, P., Sharon, D., Snyder, M., Lionakis, M., Anderson, M., Kampe, O. 2019; 8

    Abstract

    The AIRE gene plays a key role in the development of central immune tolerance by promoting thymic presentation of tissue-specific molecules. Patients with AIRE-deficiency develop multiple autoimmune manifestations and display autoantibodies against the affected tissues. In 2016 it was reported that: i) the spectrum of autoantibodies in patients with AIRE-deficiency is much broader than previously appreciated; ii) neutralizing autoantibodies to type I interferons (IFNs) could provide protection against type 1 diabetes in these patients (Meyer et al., 2016). We attempted to replicate these new findings using a similar experimental approach in an independent patient cohort, and found no evidence for either conclusion.

    View details for DOI 10.7554/eLife.43578

    View details for PubMedID 31244471

  • Engineering Genetic Predisposition in Human Neuroepithelial Stem Cells Recapitulates Medulloblastoma Tumorigenesis. Cell stem cell Huang, M., Tailor, J., Zhen, Q., Gillmor, A. H., Miller, M. L., Weishaupt, H., Chen, J., Zheng, T., Nash, E. K., McHenry, L. K., An, Z., Ye, F., Takashima, Y., Clarke, J., Ayetey, H., Cavalli, F. M., Luu, B., Moriarity, B. S., Ilkhanizadeh, S., Chavez, L., Yu, C., Kurian, K. M., Magnaldo, T., Sevenet, N., Koch, P., Pollard, S. M., Dirks, P., Snyder, M. P., Largaespada, D. A., Cho, Y. J., Phillips, J. J., Swartling, F. J., Morrissy, A. S., Kool, M., Pfister, S. M., Taylor, M. D., Smith, A., Weiss, W. A. 2019

    Abstract

    Human neural stem cell cultures provide progenitor cells that are potential cells of origin for brain cancers. However, the extent to which genetic predisposition to tumor formation can be faithfully captured in stem cell lines is uncertain. Here, we evaluated neuroepithelial stem (NES) cells, representative of cerebellar progenitors. We transduced NES cells with MYCN, observing medulloblastoma upon orthotopic implantation in mice. Significantly, transcriptomes and patternsof DNA methylation from xenograft tumors were globally more representative of human medulloblastoma compared to a MYCN-driven genetically engineered mouse model. Orthotopic transplantation of NES cells generated from Gorlin syndrome patients, who are predisposed to medulloblastoma due to germline-mutated PTCH1, also generated medulloblastoma. We engineered candidate cooperating mutations in Gorlin NES cells, with mutation of DDX3X or loss of GSE1 both accelerating tumorigenesis. These findings demonstrate that human NES cells provide a potent experimental resource for dissecting genetic causation in medulloblastoma.

    View details for DOI 10.1016/j.stem.2019.05.013

    View details for PubMedID 31204176

  • Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis (vol 6, 8035, 2015) NATURE COMMUNICATIONS Fotiou, E., Martin-Almedina, S., Simpson, M. A., Lin, S., Gordon, K., Brice, G., Atton, G., Jeffery, I., Rees, D. C., Mignot, C., Vogt, J., Homfray, T., Snyder, M. P., Rockson, S. G., Jeffery, S., Mortimer, P. S., Mansour, S., Ostergaard, P. 2019; 10
  • Analysis of the Complete Genome Sequence of a Novel, Pseudorabies Virus Strain Isolated in Southeast Europe CANADIAN JOURNAL OF INFECTIOUS DISEASES & MEDICAL MICROBIOLOGY Csabai, Z., Tombacz, D., Deim, Z., Snyder, M., Boldogkoi, Z. 2019; 2019
  • Much ado about nothing: A qualitative study of the experiences of an average-risk population receiving results of exome sequencing JOURNAL OF GENETIC COUNSELING Rego, S., Dagan-Rosenfeld, O., Bivona, S. A., Snyder, M. P., Ormond, K. E. 2019; 28 (2): 428–37

    View details for DOI 10.1002/jgc4.1096

    View details for Web of Science ID 000463993600026

  • Much ado about nothing: A qualitative study of the experiences of an average-risk population receiving results of exome sequencing. Journal of genetic counseling Rego, S., Dagan-Rosenfeld, O., Bivona, S. A., Snyder, M. P., Ormond, K. E. 2019

    Abstract

    The increasing availability of exome sequencing to the general ("healthy") population raises questions about the implications of genomic testing for individuals without suspected Mendelian diseases. Little is known about this population's motivations for undergoing exome sequencing, their expectations, reactions, and perceptions of utility. In order to address these questions, we conducted in-depth semi-structured interviews with 12 participants recruited from a longitudinal multi-omics profiling study that included exome sequencing. Participants were interviewed after receiving exome results, which included Mendelian disease-associated pathogenic and likely pathogenic variants, pharmacogenetic variants, and risk assessments for multifactorial diseases such as type 2 diabetes. The primary motivation driving participation in exome sequencing was personal curiosity. While they reported feeling validation and relief, participants were frequently underwhelmed by the results and described having expected more from exome sequencing. All participants reported discussing the results with at least some family, friends, and healthcare providers. Participants' recollection of the results returned to them was sometimes incorrect or incomplete, in many cases aligning with their perceptions of their health risks when entering the study. These results underscore the need for different genetic counseling approaches for generally healthy patients undergoing exome sequencing, in particular the need to provide anticipatory guidance to moderate participants' expectations. They also provide a preview of potential challenges clinicians may face as genomic sequencing continues to scale-up in the general population despite a lack of full understanding of its impact.

    View details for PubMedID 30835913

  • Multi-Omics Profiling, Microscopic Cervical Remodeling, and Parturition: Insights from the Smart Diaphragm Study. Liang, L., Dunn, J. P., Chen, S., Tsai, M., Hornburg, D., Newmann, S., Avina, M., Leng, Y., Holman, R., Lee, T. H., Qureshi, S., Montelongo, E., Zhao, B., Jeliffe, L., Snyder, M., Rand, L. SAGE PUBLICATIONS INC. 2019: 216A
  • Applying circulating tumor DNA methylation in the diagnosis of lung cancer. Precision clinical medicine Li, L., Fu, K., Zhou, W., Snyder, M. 2019; 2 (1): 45-56

    Abstract

    Lung cancer is the leading cause of cancer-related deaths worldwide. Low dose computed tomography (LDCT) is commonly used for disease screening, with identified candidate cancerous regions further diagnosed using tissue biopsy. However, existing techniques are all invasive and unavoidably cause multiple complications. In contrast, liquid biopsy is a noninvasive, ideal surrogate for tissue biopsy that can identify circulating tumor DNA (ctDNA) containing tumorigenic signatures. It has been successfully implemented to assist treatment decisions and disease outcome prediction. ctDNA methylation, a type of lipid biopsy that profiles critical epigenetic alterations occurring during carcinogenesis, has gained increasing attention. Indeed, aberrant ctDNA methylation occurs at early stages in lung malignancy and therefore can be used as an alternative for the early diagnosis of lung cancer. In this review, we give a brief synopsis of the biological basis and detecting techniques of ctDNA methylation. We then summarize the latest progress in use of ctDNA methylation as a diagnosis biomarker. Lastly, we discuss the major issues that limit application of ctDNA methylation in the clinic, and propose possible solutions to enhance its usage.

    View details for DOI 10.1093/pcmedi/pbz003

    View details for PubMedID 35694699

    View details for PubMedCentralID PMC8985769

  • Windows Into Human Health Through Wearables Data Analytics. Current opinion in biomedical engineering Witt, D., Kellogg, R., Snyder, M., Dunn, J. 2019; 9: 28–46

    Abstract

    Background: Wearable sensors (wearables) have been commonly integrated into a wide variety of commercial products and are increasingly being used to collect and process raw physiological parameters into salient digital health information. The data collected by wearables are currently being investigated across a broad set of clinical domains and patient populations. There is significant research occurring in the domain of algorithm development, with the aim of translating raw sensor data into fitness- or health-related outcomes of interest for users, patients, and health care providers.Objectives: The aim of this review is to highlight a selected group of fitness- and health-related indicators from wearables data and to describe several algorithmic approaches used to generate these higher order indicators.Methods: A systematic search of the Pubmed database was performed with the following search terms (number of records in parentheses): Fitbit algorithm (18), Apple Watch algorithm (3), Garmin algorithm (5), Microsoft Band algorithm (8), Samsung Gear algorithm (2), Xiaomi MiBand algorithm (1), Huawei Band (Watch) algorithm (2), photoplethysmography algorithm (465), accelerometry algorithm (966), ECG algorithm (8287), continuous glucose monitor algorithm (343). The search terms chosen for this review are focused on algorithms for wearable devices that dominated the commercial wearables market between 2014-2017 and that were highly represented in the biomedical literature. A second set of search terms included categories of algorithms for fitness-related and health-related indicators that are commonly used in wearable devices (e.g. accelerometry, PPG, ECG). These papers covered the following domain areas: fitness; exercise; movement; physical activity; step count; walking; running; swimming; energy expenditure; atrial fibrillation; arrhythmia; cardiovascular; autonomic nervous system; neuropathy; heart rate variability; fall detection; trauma; behavior change; diet; eating; stress detection; serum glucose monitoring; continuous glucose monitoring; diabetes mellitus type 1; diabetes mellitus type 2. All studies uncovered through this search on commercially available device algorithms and pivotal studies on sensor algorithm development were summarized, and a summary table was constructed using references generated by the literature review as described (Table 1).Conclusions: Wearable health technologies aim to collect and process raw physiological or environmental parameters into salient digital health information. Much of the current and future utility of wearables lies in the signal processing steps and algorithms used to analyze large volumes of data. Continued algorithmic development and advances in machine learning techniques will further increase analytic capabilities. In the context of these advances, our review aims to highlight a range of advances in fitness- and other health-related indicators provided by current wearable technologies.

    View details for DOI 10.1016/j.cobme.2019.01.001

    View details for PubMedID 31832566

  • Lifelong physical activity is associated with promoter hypomethylation of genes involved in metabolism, myogenesis, contractile properties and oxidative stress resistance in aged human skeletal muscle. Scientific reports Sailani, M. R., Halling, J. F., Moller, H. D., Lee, H., Plomgaard, P., Pilegaard, H., Snyder, M. P., Regenberg, B. 2019; 9 (1): 3272

    Abstract

    Lifelong regular physical activity is associated with reduced risk of type 2 diabetes (T2D), maintenance of muscle mass and increased metabolic capacity. However, little is known about epigenetic mechanisms that might contribute to these beneficial effects in aged individuals. We investigated the effect of lifelong physical activity on global DNA methylation patterns in skeletal muscle of healthy aged men, who had either performed regular exercise or remained sedentary their entire lives (average age 62 years). DNA methylation was significantly lower in 714 promoters of the physically active than inactive men while methylation of introns, exons and CpG islands was similar in the two groups. Promoters for genes encoding critical insulin-responsive enzymes in glycogen metabolism, glycolysis and TCA cycle were hypomethylated in active relative to inactive men. Hypomethylation was also found in promoters of myosin light chain, dystrophin, actin polymerization, PAK regulatory genes and oxidative stress response genes. A cluster of genes regulated by GSK3beta-TCF7L2 also displayed promoter hypomethylation. Together, our results suggest that lifelong physical activity is associated with DNA methylation patterns that potentially allow for increased insulin sensitivity and a higher expression of genes in energy metabolism, myogenesis, contractile properties and oxidative stress resistance in skeletal muscle of aged individuals.

    View details for PubMedID 30824849

  • Long-Read Sequencing - A Powerful Tool in Viral Transcriptome Research. Trends in microbiology Boldogkoi, Z., Moldovan, N., Balazs, Z., Snyder, M., Tombacz, D. 2019

    Abstract

    Long-read sequencing (LRS) has become increasingly popular due to its strengths in de novo assembly and in resolving complex DNA regions as well as in determining full-length RNA molecules. Two important LRS technologies have been developed during the past few years, including single-molecule, real-time sequencing by Pacific Biosciences, and nanopore sequencing by Oxford Nanopore Technologies. Although current LRS methods produce lower coverage, and are more error prone than short-read sequencing, these methods continue to be superior in identifying transcript isoforms including multispliced RNAs and transcript-length variants as well as overlapping transcripts and alternative polycistronic RNA molecules. Viruses have small, compact genomes and therefore these organisms are ideal subjects for transcriptome analysis with the relatively low-throughput LRS techniques. Recent LRS studies have multiplied the number of previously known transcripts and have revealed complex networks of transcriptional overlaps in the examined viruses.

    View details for PubMedID 30824172

  • 2017 NIH-wide workshop report on "The Human Microbiome: Emerging Themes at the Horizon of the 21st Century" MICROBIOME Alm, E., Borenstein, E., Britton, R. A., Bultman, S. J., Chang, E. B., Cho, M., Dantas, G., Dominguez-Bello, M., Donovan, S. M., Dorrestein, P., Douglas, A. E., Gewirtz, A., Ghannoum, M., Goodman, A. L., Gordon, J. I., Huffnagle, G. B., Jenq, R. R., Jia, W., Knight, R., Koropatkin, N., Lampe, J. W., Lu, T., Ochman, H., Pamer, E. G., Patterson, A. D., Philpott, D., Pollard, K. S., Rawls, J. F., Salzman, N. H., Sears, C. L., Stappenbeck, T., Taga, M. E., Turnbaugh, P. J., Wang, H. H., Wu, G. D., Xavier, R. J., 2017 NIH-Wide Microbiome Workshop 2019; 7: 32

    Abstract

    The National Institutes of Health (NIH) organized a three-day human microbiome research workshop, August 16-18, 2017, to highlight the accomplishments of the 10-year Human Microbiome Project program, the outcomes of the investments made by the 21 NIH Institutes and Centers which now fund this area, and the technical challenges and knowledge gaps which will need to be addressed in order for this field to advance over the next 10 years. This report summarizes the key points in the talks, round table discussions, and Joint Agency Panel from this workshop.

    View details for DOI 10.1186/s40168-019-0627-4

    View details for Web of Science ID 000459927100002

    View details for PubMedID 30808401

    View details for PubMedCentralID PMC6391828

  • Whole-exome sequencing data of suicide victims who had suffered from major depressive disorder. Scientific data Tombacz, D., Maroti, Z., Kalmar, T., Palkovits, M., Snyder, M., Boldogkoi, Z. 2019; 6: 190010

    Abstract

    Suicide is one of the leading causes of mortality worldwide; it causes the death of more than one million patients each year. Suicide is a complex, multifactorial phenotype with environmental and genetic factors contributing to the risk of the forthcoming suicide. These factors first generally lead to mental disorders, such as depression, schizophrenia and bipolar disorder, which then become the direct cause of suicide. Here we present a high quality dataset (including processed BAM and VCF files) gained from the high-throughput whole-exome Illumina sequencing of 23 suicide victims - all of whom had suffered from major depressive disorder - and 21 control patients to a depth of at least 40-fold coverage in both cohorts. We identified ~130,000 variants per sample and altogether 442,270 unique variants in the cohort of 44 samples. To our best knowledge, this is the first whole-exome sequencing dataset from suicide victims. We expect that this dataset provides useful information for genomic studies of suicide and depression, and also for the analysis of the Hungarian population.

    View details for PubMedID 30720799

  • Whole-exome sequencing data of suicide victims who had suffered from major depressive disorder SCIENTIFIC DATA Tombacz, D., Maroti, Z., Kalmar, T., Palkovits, M., Snyder, M., Boldogkoi, Z. 2019; 6
  • Smooth Muscle Contact Drives Endothelial Regeneration by BMPR2-Notch1-Mediated Metabolic and Epigenetic Changes CIRCULATION RESEARCH Miyagawa, K., Shi, M., Chen, P., Hennigs, J. K., Zhao, Z., Wang, M., Li, C. G., Saito, T., Taylor, S., Sa, S., Cao, A., Wang, L., Snyder, M. P., Rabinovitch, M. 2019; 124 (2): 211–24
  • Activation of PDGF pathway links LMNA mutation to dilated cardiomyopathy. Nature Lee, J. n., Termglinchan, V. n., Diecke, S. n., Itzhaki, I. n., Lam, C. K., Garg, P. n., Lau, E. n., Greenhaw, M. n., Seeger, T. n., Wu, H. n., Zhang, J. Z., Chen, X. n., Gil, I. P., Ameen, M. n., Sallam, K. n., Rhee, J. W., Churko, J. M., Chaudhary, R. n., Chour, T. n., Wang, P. J., Snyder, M. P., Chang, H. Y., Karakikes, I. n., Wu, J. C. 2019

    Abstract

    Lamin A/C (LMNA) is one of the most frequently mutated genes associated with dilated cardiomyopathy (DCM). DCM related to mutations in LMNA is a common inherited cardiomyopathy that is associated with systolic dysfunction and cardiac arrhythmias. Here we modelled the LMNA-related DCM in vitro using patient-specific induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs). Electrophysiological studies showed that the mutant iPSC-CMs displayed aberrant calcium homeostasis that led to arrhythmias at the single-cell level. Mechanistically, we show that the platelet-derived growth factor (PDGF) signalling pathway is activated in mutant iPSC-CMs compared to isogenic control iPSC-CMs. Conversely, pharmacological and molecular inhibition of the PDGF signalling pathway ameliorated the arrhythmic phenotypes of mutant iPSC-CMs in vitro. Taken together, our findings suggest that the activation of the PDGF pathway contributes to the pathogenesis of LMNA-related DCM and point to PDGF receptor-β (PDGFRB) as a potential therapeutic target.

    View details for DOI 10.1038/s41586-019-1406-x

    View details for PubMedID 31316208

  • Phenotypically-Silent Bone Morphogenetic Protein Receptor 2 (Bmpr2) Mutations Predispose Rats to Inflammation-Induced Pulmonary Arterial Hypertension by Enhancing The Risk for Neointimal Transformation. Circulation Tian, W. n., Jiang, X. n., Sung, Y. K., Shuffle, E. n., Wu, T. H., Kao, P. N., Tu, A. B., Dorfmüller, P. n., Cao, A. n., Wang, L. n., Peng, G. n., Kim, Y. n., Zhang, P. n., Chappell, J. n., Pasupneti, S. n., Dahms, P. n., Maguire, P. n., Chaib, H. n., Zamanian, R. n., Peters-Golden, M. n., Snyder, M. P., Voelkel, N. F., Humbert, M. n., Rabinovitch, M. n., Nicolls, M. R. 2019

    Abstract

    Bmpr2 mutations are critical risk factors for hereditary pulmonary arterial hypertension (hPAH) with approximately 20% of carriers developing disease. There is an unmet medical need to understand how environmental factors, such as inflammation, render Bmpr2 mutants susceptible to PAH. Overexpressing 5-lipoxygenase (5-LO) provokes lung inflammation and transient PAH in Bmpr2+/- mice. Accordingly, 5-LO and its metabolite, leukotriene B4 (LTB4), are candidates for the 'second hit'. The purpose of this study was to determine how 5-LO-mediated pulmonary inflammation synergized with phenotypically-silent Bmpr2 defects to elicit significant pulmonary vascular disease in rats.Monoallelic Bmpr2 mutant rats were generated and found phenotypically normal for up to one year of observation. To evaluate whether a second hit would elicit disease, animals were exposed to 5-LO-expressing adenovirus (AdAlox5), monocrotaline, SU5416, SU5416 with chronic hypoxia or chronic hypoxia alone. Bmpr2-mutant hPAH patient samples were assessed for neointimal 5-LO expression. Pulmonary artery endothelial cells (PAECs) with impaired BMPR2 signaling were exposed to increased 5-LO-mediated inflammation and were assessed for phenotypic and transcriptomic changes.Lung inflammation, induced by intratracheal delivery of AdAlox5, elicited severe PAH with intimal remodeling in Bmpr2+/- rats but not in their wild-type littermates. Neointimal lesions in the diseased Bmpr2+/- rats gained endogenous 5-LO expression associated with elevated LTB4 biosynthesis. Bmpr2-mutant hPAH patients similarly expressed 5-LO in the neointimal cells. In vitro, BMPR2 deficiency, compounded by 5-LO-mediated inflammation, generated apoptosis-resistant, and proliferative PAECs with mesenchymal characteristics. These transformed cells expressed nuclear envelope-localized 5-LO consistent with induced LTB4 production, as well as a transcriptomic signature similar to clinical disease, including upregulated NF-κB, IL-6, and TGF-β signaling pathways. The reversal of PAH and vasculopathy in Bmpr2 mutants by TGF-β antagonism suggests that TGF-β is critical for neointimal transformation.In a new 'two-hit' model of disease, lung inflammation induced severe PAH pathology in Bmpr2+/- rats. Endothelial transformation required the activation of canonical and noncanonical TGF-β signaling pathways and was characterized by 5-LO nuclear envelope translocation with enhanced LTB4 production. This study offers one explanation of how an environmental injury unleashes the destructive potential of an otherwise-silent genetic mutation.

    View details for DOI 10.1161/CIRCULATIONAHA.119.040629

    View details for PubMedID 31462075

  • Macrophage de novo NAD(+) synthesis specifies immune function in aging and inflammation NATURE IMMUNOLOGY Minhas, P. S., Liu, L., Moon, P. K., Joshi, A. U., Dove, C., Mhatre, S., Contrepois, K., Wang, Q., Lee, B. A., Coronado, M., Bernstein, D., Snyder, M. P., Migaud, M., Majeti, R., Mochly-Rosen, D., Rabinowitz, J. D., Andreasson, K. I. 2019; 20 (1): 50-+
  • Progress on Identifying and Characterizing the Human Proteome: 2019 Metrics from the HUPO Human Proteome Project. Journal of proteome research Omenn, G. S., Lane, L. n., Overall, C. M., Corrales, F. J., Schwenk, J. M., Paik, Y. K., Van Eyk, J. E., Liu, S. n., Pennington, S. n., Snyder, M. P., Baker, M. S., Deutsch, E. W. 2019

    Abstract

    The Human Proteome Project (HPP) annually reports on progress made throughout the field in credibly identifying and characterizing the complete human protein parts list and making proteomics an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2019-01-11 contains 17 694 proteins with strong protein-level evidence (PE1), compliant with HPP Guidelines for Interpretation of MS Data v2.1; these represent 89% of all 19 823 neXtProt predicted coding genes (all PE1,2,3,4 proteins), up from 17 470 one year earlier. Conversely, the number of neXtProt PE2,3,4 proteins, termed the "missing proteins" (MPs), has been reduced from 2949 to 2129 since 2016 through efforts throughout the community, including the chromosome-centric HPP. PeptideAtlas is the source of uniformly reanalyzed raw mass spectrometry data for neXtProt; PeptideAtlas added 495 canonical proteins between 2018 and 2019, especially from studies designed to detect hard-to-identify proteins. Meanwhile, the Human Protein Atlas has released version 18.1 with immunohistochemical evidence of expression of 17 000 proteins and survival plots as part of the Pathology Atlas. Many investigators apply multiplexed SRM-targeted proteomics for quantitation of organ-specific popular proteins in studies of various human diseases. The 19 teams of the Biology and Disease-driven B/D-HPP published a total of 160 publications in 2018, bringing proteomics to a broad array of biomedical research.

    View details for DOI 10.1021/acs.jproteome.9b00434

    View details for PubMedID 31430157

  • MACHINE LEARNING ANALYSIS OF ULTRA-DEEP WHOLE-GENOME SEQUENCING IN HUMAN BRAIN REVEALS SOMATIC GENOMIC RETROTRANSPOSITION IN GLIA AS WELL AS IN NEURONS Urban, A., Zhu, X., Zhou, B., Sloan, S., Pattni, R., Fiston-Lavier, A., Snyder, M., Petrov, D., Abyzov, A., Vaccarino, F., Barres, B., Vogel, H., Tamminga, C., Levinson, D. ELSEVIER. 2019: 1240
  • Global metabolic profiling to model biological processes of aging in twins. Aging cell Bunning, B. J., Contrepois, K. n., Lee-McMullen, B. n., Dhondalay, G. K., Zhang, W. n., Tupa, D. n., Raeber, O. n., Desai, M. n., Nadeau, K. C., Snyder, M. P., Andorf, S. n. 2019: e13073

    Abstract

    Aging is intimately linked to system-wide metabolic changes that can be captured in blood. Understanding biological processes of aging in humans could help maintain a healthy aging trajectory and promote longevity. We performed untargeted plasma metabolomics quantifying 770 metabolites on a cross-sectional cohort of 268 healthy individuals including 125 twin pairs covering human lifespan (from 6 months to 82 years). Unsupervised clustering of metabolic profiles revealed 6 main aging trajectories throughout life that were associated with key metabolic pathways such as progestin steroids, xanthine metabolism, and long-chain fatty acids. A random forest (RF) model was successful to predict age in adult subjects (≥16 years) using 52 metabolites (R2  = .97). Another RF model selected 54 metabolites to classify pediatric and adult participants (out-of-bag error = 8.58%). These RF models in combination with correlation network analysis were used to explore biological processes of healthy aging. The models highlighted established metabolites, like steroids, amino acids, and free fatty acids as well as novel metabolites and pathways. Finally, we show that metabolic profiles of twins become more dissimilar with age which provides insights into nongenetic age-related variability in metabolic profiles in response to environmental exposure.

    View details for DOI 10.1111/acel.13073

    View details for PubMedID 31746094

  • Smart Diaphragm Study: Multi-omics profiling and cervical device measurements during pregnancy Liang, L., Dunn, J. P., Chen, S., Tsai, M., Hornburg, D., Newmann, S., Chung, P., Avina, M., Leng, Y., Holman, R., Lee, T. H., Berrios, S., Qureshi, S. A., Baer, R., Etemadi, M., Montelongo, E., Paynter, R., Zhao, B., Roy, S., Jelliffe, L., Snyder, M., Rand, L. MOSBY-ELSEVIER. 2019: S649
  • Personalized Metabolomics. Methods in molecular biology (Clifton, N.J.) Marciano, D. P., Snyder, M. P. 2019; 1978: 447–56

    Abstract

    The human metabolome is the cumulative product of ingested metabolites and those produced by the body and its microbiota. Together these metabolites can dynamically report on the health and disease state of an individual, as well as their response to drug treatments and other external perturbations. Profiling metabolites in human body fluids provides an opportunity to identify biomarkers and stratify patients for personalized treatments but requires the development of high-throughput approaches compatible with large cohort and longitudinal studies. Here we review in detail sample preparation and analytical liquid chromatography-mass spectrometry (LC-MS) methods to measure the broad chemical diversity of metabolites found in human plasma and urine.

    View details for DOI 10.1007/978-1-4939-9236-2_27

    View details for PubMedID 31119679

  • Analysis of the Complete Genome Sequence of a Novel, Pseudorabies Virus Strain Isolated in Southeast Europe. The Canadian journal of infectious diseases & medical microbiology = Journal canadien des maladies infectieuses et de la microbiologie medicale Csabai, Z. n., Tombácz, D. n., Deim, Z. n., Snyder, M. n., Boldogkői, Z. n. 2019; 2019: 1806842

    Abstract

    Pseudorabies virus (PRV) is the causative agent of Aujeszky's disease giving rise to significant economic losses worldwide. Many countries have implemented national programs for the eradication of this virus. In this study, long-read sequencing was used to determine the nucleotide sequence of the genome of a novel PRV strain (PRV-MdBio) isolated in Serbia.In this study, a novel PRV strain was isolated and characterized. PRV-MdBio was found to exhibit similar growth properties to those of another wild-type PRV, the strain Kaplan. Single-molecule real-time (SMRT) sequencing has revealed that the new strain differs significantly in base composition even from strain Kaplan, to which it otherwise exhibits the highest similarity. We compared the genetic composition of PRV-MdBio to strain Kaplan and the China reference strain Ea and obtained that radical base replacements were the most common point mutations preceding conservative and silent mutations. We also found that the adaptation of PRV to cell culture does not lead to any tendentious genetic alteration in the viral genome.PRV-MdBio is a wild-type virus, which differs in base composition from other PRV strains to a relatively large extent.

    View details for PubMedID 31093307

  • Heterogeneity in old fibroblasts is linked to variability in reprogramming and wound healing. Nature Mahmoudi, S. n., Mancini, E. n., Xu, L. n., Moore, A. n., Jahanbani, F. n., Hebestreit, K. n., Srinivasan, R. n., Li, X. n., Devarajan, K. n., Prélot, L. n., Ang, C. E., Shibuya, Y. n., Benayoun, B. A., Chang, A. L., Wernig, M. n., Wysocka, J. n., Longaker, M. T., Snyder, M. P., Brunet, A. n. 2019; 574 (7779): 553–58

    Abstract

    Age-associated chronic inflammation (inflammageing) is a central hallmark of ageing1, but its influence on specific cells remains largely unknown. Fibroblasts are present in most tissues and contribute to wound healing2,3. They are also the most widely used cell type for reprogramming to induced pluripotent stem (iPS) cells, a process that has implications for regenerative medicine and rejuvenation strategies4. Here we show that fibroblast cultures from old mice secrete inflammatory cytokines and exhibit increased variability in the efficiency of iPS cell reprogramming between mice. Variability between individuals is emerging as a feature of old age5-8, but the underlying mechanisms remain unknown. To identify drivers of this variability, we performed multi-omics profiling of fibroblast cultures from young and old mice that have different reprogramming efficiencies. This approach revealed that fibroblast cultures from old mice contain 'activated fibroblasts' that secrete inflammatory cytokines, and that the proportion of activated fibroblasts in a culture correlates with the reprogramming efficiency of that culture. Experiments in which conditioned medium was swapped between cultures showed that extrinsic factors secreted by activated fibroblasts underlie part of the variability between mice in reprogramming efficiency, and we have identified inflammatory cytokines, including TNF, as key contributors. Notably, old mice also exhibited variability in wound healing rate in vivo. Single-cell RNA-sequencing analysis identified distinct subpopulations of fibroblasts with different cytokine expression and signalling in the wounds of old mice with slow versus fast healing rates. Hence, a shift in fibroblast composition, and the ratio of inflammatory cytokines that they secrete, may drive the variability between mice in reprogramming in vitro and influence wound healing rate in vivo. This variability may reflect distinct stochastic ageing trajectories between individuals, and could help in developing personalized strategies to improve iPS cell generation and wound healing in elderly individuals.

    View details for DOI 10.1038/s41586-019-1658-5

    View details for PubMedID 31645721

  • Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy BIOINFORMATICS Ghaemi, M., DiGiulio, D. B., Contrepois, K., Callahan, B., Ngo, T. M., Lee-McMullen, B., Lehallier, B., Robaczewska, A., Mcilwain, D., Rosenberg-Hasson, Y., Wong, R. J., Quaintance, C., Culos, A., Stanley, N., Tanada, A., Tsai, A., Gaudilliere, D., Ganio, E., Han, X., Ando, K., McNeil, L., Tingle, M., Wise, P., Maric, I., Sirota, M., Wyss-Coray, T., Winn, V. D., Druzin, M. L., Gibbs, R., Darmstadt, G. L., Lewis, D. B., Nia, V., Agard, B., Tibshirani, R., Nolan, G., Snyder, M. P., Relman, D. A., Quake, S. R., Shaw, G. M., Stevenson, D. K., Angst, M. S., Gaudilliere, B., Aghaeepour, N. 2019; 35 (1): 95–103
  • Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy. Bioinformatics (Oxford, England) Ghaemi, M. S., DiGiulio, D. B., Contrepois, K., Callahan, B., Ngo, T. T., Lee-McMullen, B., Lehallier, B., Robaczewska, A., Mcilwain, D., Rosenberg-Hasson, Y., Wong, R. J., Quaintance, C., Culos, A., Stanley, N., Tanada, A., Tsai, A., Gaudilliere, D., Ganio, E., Han, X., Ando, K., McNeil, L., Tingle, M., Wise, P., Maric, I., Sirota, M., Wyss-Coray, T., Winn, V. D., Druzin, M. L., Gibbs, R., Darmstadt, G. L., Lewis, D. B., Partovi Nia, V., Agard, B., Tibshirani, R., Nolan, G., Snyder, M. P., Relman, D. A., Quake, S. R., Shaw, G. M., Stevenson, D. K., Angst, M. S., Gaudilliere, B., Aghaeepour, N. 2019; 35 (1): 95–103

    Abstract

    Motivation: Multiple biological clocks govern a healthy pregnancy. These biological mechanisms produce immunologic, metabolomic, proteomic, genomic and microbiomic adaptations during the course of pregnancy. Modeling the chronology of these adaptations during full-term pregnancy provides the frameworks for future studies examining deviations implicated in pregnancy-related pathologies including preterm birth and preeclampsia.Results: We performed a multiomics analysis of 51 samples from 17 pregnant women, delivering at term. The datasets included measurements from the immunome, transcriptome, microbiome, proteome and metabolome of samples obtained simultaneously from the same patients. Multivariate predictive modeling using the Elastic Net (EN) algorithm was used to measure the ability of each dataset to predict gestational age. Using stacked generalization, these datasets were combined into a single model. This model not only significantly increased predictive power by combining all datasets, but also revealed novel interactions between different biological modalities. Future work includes expansion of the cohort to preterm-enriched populations and in vivo analysis of immune-modulating interventions based on the mechanisms identified.Availability and implementation: Datasets and scripts for reproduction of results are available through: https://nalab.stanford.edu/multiomics-pregnancy/.Supplementary information: Supplementary data are available at Bioinformatics online.

    View details for PubMedID 30561547

  • A machine-compiled database of genome-wide association studies. Nature communications Kuleshov, V. n., Ding, J. n., Vo, C. n., Hancock, B. n., Ratner, A. n., Li, Y. n., Ré, C. n., Batzoglou, S. n., Snyder, M. n. 2019; 10 (1): 3341

    Abstract

    Tens of thousands of genotype-phenotype associations have been discovered to date, yet not all of them are easily accessible to scientists. Here, we describe GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms. Our information extraction system helps curators by automatically collecting over 6,000 associations from open-access publications with an estimated recall of 60-80% and with an estimated precision of 78-94% (measured relative to existing manually curated knowledge bases). This system represents a fully automated GWAS curation effort and is made possible by a paradigm for constructing machine learning systems called data programming. Our work represents a step towards making the curation of scientific literature more efficient using automated systems.

    View details for DOI 10.1038/s41467-019-11026-x

    View details for PubMedID 31350405

  • Mitigation of off-target toxicity in CRISPR-Cas9 screens for essential non-coding elements. Nature communications Tycko, J. n., Wainberg, M. n., Marinov, G. K., Ursu, O. n., Hess, G. T., Ego, B. K., Aradhana, n. n., Li, A. n., Truong, A. n., Trevino, A. E., Spees, K. n., Yao, D. n., Kaplow, I. M., Greenside, P. G., Morgens, D. W., Phanstiel, D. H., Snyder, M. P., Bintu, L. n., Greenleaf, W. J., Kundaje, A. n., Bassik, M. C. 2019; 10 (1): 4063

    Abstract

    Pooled CRISPR-Cas9 screens are a powerful method for functionally characterizing regulatory elements in the non-coding genome, but off-target effects in these experiments have not been systematically evaluated. Here, we investigate Cas9, dCas9, and CRISPRi/a off-target activity in screens for essential regulatory elements. The sgRNAs with the largest effects in genome-scale screens for essential CTCF loop anchors in K562 cells were not single guide RNAs (sgRNAs) that disrupted gene expression near the on-target CTCF anchor. Rather, these sgRNAs had high off-target activity that, while only weakly correlated with absolute off-target site number, could be predicted by the recently developed GuideScan specificity score. Screens conducted in parallel with CRISPRi/a, which do not induce double-stranded DNA breaks, revealed that a distinct set of off-targets also cause strong confounding fitness effects with these epigenome-editing tools. Promisingly, filtering of CRISPRi libraries using GuideScan specificity scores removed these confounded sgRNAs and enabled identification of essential regulatory elements.

    View details for DOI 10.1038/s41467-019-11955-7

    View details for PubMedID 31492858

  • Understanding health disparities. Journal of perinatology : official journal of the California Perinatal Association Stevenson, D. K., Wong, R. J., Aghaeepour, N., Angst, M. S., Darmstadt, G. L., DiGiulio, D. B., Druzin, M. L., Gaudilliere, B., Gibbs, R. S., B Gould, J., Katz, M., Li, J., Moufarrej, M. N., Quaintance, C. C., Quake, S. R., Relman, D. A., Shaw, G. M., Snyder, M. P., Wang, X., Wise, P. H. 2018

    Abstract

    Based upon our recent insights into the determinants of preterm birth, which is the leading cause of death in children under five years of age worldwide, we describe potential analytic frameworks that provides both a common understanding and, ultimately the basis for effective, ameliorative action. Our research on preterm birth serves as an example that the framing of any human health condition is a result of complex interactions between the genome and the exposome. New discoveries of the basic biology of pregnancy, such as the complex immunological and signaling processes that dictate the health and length of gestation, have revealed a complexity in the interactions (current and ancestral) between genetic and environmental forces. Understanding of these relationships may help reduce disparities in preterm birth and guide productive research endeavors and ultimately, effective clinical and public health interventions.

    View details for PubMedID 30560947

  • Cross-Platform Comparison of Untargeted and Targeted Lipidomics Approaches on Aging Mouse Plasma. Scientific reports Contrepois, K., Mahmoudi, S., Ubhi, B. K., Papsdorf, K., Hornburg, D., Brunet, A., Snyder, M. 2018; 8 (1): 17747

    Abstract

    Lipidomics - the global assessment of lipids - can be performed using a variety of mass spectrometry (MS)-based approaches. However, choosing the optimal approach in terms of lipid coverage, robustness and throughput can be a challenging task. Here, we compare a novel targeted quantitative lipidomics platform known as the Lipidyzer to a conventional untargeted liquid chromatography (LC)-MS approach. We find that both platforms are efficient in profiling more than 300 lipids across 11 lipid classes in mouse plasma with precision and accuracy below 20% for most lipids. While the untargeted and targeted platforms detect similar numbers of lipids, the former identifies a broader range of lipid classes and can unambiguously identify all three fatty acids in triacylglycerols (TAG). Quantitative measurements from both approaches exhibit a median correlation coefficient (r) of 0.99 using a dilution series of deuterated internal standards and 0.71 using endogenous plasma lipids in the context of aging. Application of both platforms to plasma from aging mouse reveals similar changes in total lipid levels across all major lipid classes and in specific lipid species. Interestingly, TAG is the lipid class that exhibits the most changes with age, suggesting that TAG metabolism is particularly sensitive to the aging process in mice. Collectively, our data show that the Lipidyzer platform provides comprehensive profiling of the most prevalent lipids in plasma in a simple and automated manner.

    View details for PubMedID 30532037

  • Progress on Identifying and Characterizing the Human Proteome: 2018 Metrics from the HUPO Human Proteome Project JOURNAL OF PROTEOME RESEARCH Omenn, G. S., Lane, L., Overall, C. M., Corrales, F. J., Schwenk, J. M., Paik, Y., Van Eyk, J. E., Liu, S., Snyder, M., Baker, M. S., Deutsch, E. W. 2018; 17 (12): 4031–41

    Abstract

    The Human Proteome Project (HPP) annually reports on progress throughout the field in credibly identifying and characterizing the human protein parts list and making proteomics an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2018-01-17, the baseline for this sixth annual HPP special issue of the Journal of Proteome Research, contains 17 470 PE1 proteins, 89% of all neXtProt predicted PE1-4 proteins, up from 17 008 in release 2017-01-23 and 13 975 in release 2012-02-24. Conversely, the number of neXtProt PE2,3,4 missing proteins has been reduced from 2949 to 2579 to 2186 over the past two years. Of the PE1 proteins, 16 092 are based on mass spectrometry results, and 1378 on other kinds of protein studies, notably protein-protein interaction findings. PeptideAtlas has 15 798 canonical proteins, up 625 over the past year, including 269 from SUMOylation studies. The largest reason for missing proteins is low abundance. Meanwhile, the Human Protein Atlas has released its Cell Atlas, Pathology Atlas, and updated Tissue Atlas, and is applying recommendations from the International Working Group on Antibody Validation. Finally, there is progress using the quantitative multiplex organ-specific popular proteins targeted proteomics approach in various disease categories.

    View details for PubMedID 30099871

  • Transcriptomic study of Herpes simplex virus type-1 using full-length sequencing techniques. Scientific data Boldogkoi, Z., Szucs, A., Balazs, Z., Sharon, D., Snyder, M., Tombacz, D. 2018; 5: 180266

    Abstract

    Herpes simplex virus type-1 (HSV-1) is a human pathogenic member of the Alphaherpesvirinae subfamily of herpesviruses. The HSV-1 genome is a large double-stranded DNA specifying about 85 protein coding genes. The latest surveys have demonstrated that the HSV-1 transcriptome is much more complex than it had been thought before. Here, we provide a long-read sequencing dataset, which was generated by using the RSII and Sequel systems from Pacific Biosciences (PacBio), as well as MinION sequencing system from Oxford Nanopore Technologies (ONT). This dataset contains 39,096 reads of inserts (ROIs) mapped to the HSV-1 genome (X14112) in RSII sequencing, while Sequel sequencing yielded 77,851 ROIs. The MinION cDNA sequencing altogether resulted in 158,653 reads, while the direct RNA-seq produced 16,516 reads. This dataset can be utilized for the identification of novel HSV RNAs and transcripts isoforms, as well as for the comparison of the quality and length of the sequencing reads derived from the currently available long-read sequencing platforms. The various library preparation approaches can also be compared with each other.

    View details for PubMedID 30480662

  • Transcriptomic study of Herpes simplex virus type-1 using full-length sequencing techniques SCIENTIFIC DATA Boldogkoi, Z., Szucs, A., Balazs, Z., Sharon, D., Snyder, M., Tombacz, D. 2018; 5
  • Macrophage de novo NAD+ synthesis specifies immune function in aging and inflammation. Nature immunology Minhas, P. S., Liu, L., Moon, P. K., Joshi, A. U., Dove, C., Mhatre, S., Contrepois, K., Wang, Q., Lee, B. A., Coronado, M., Bernstein, D., Snyder, M. P., Migaud, M., Majeti, R., Mochly-Rosen, D., Rabinowitz, J. D., Andreasson, K. I. 2018

    Abstract

    Recent advances highlight a pivotal role for cellular metabolism in programming immune responses. Here, we demonstrate that cell-autonomous generation of nicotinamide adenine dinucleotide (NAD+) via the kynurenine pathway (KP) regulates macrophage immune function in aging and inflammation. Isotope tracer studies revealed that macrophage NAD+ derives substantially from KP metabolism of tryptophan. Genetic or pharmacological blockade of de novo NAD+ synthesis depleted NAD+, suppressed mitochondrial NAD+-dependent signaling and respiration, and impaired phagocytosis and resolution of inflammation. Innate immune challenge triggered upstream KP activation but paradoxically suppressed cell-autonomous NAD+ synthesis by limiting the conversion of downstream quinolinate to NAD+, a profile recapitulated in aging macrophages. Increasing de novo NAD+ generation in immune-challenged or aged macrophages restored oxidative phosphorylation and homeostatic immune responses. Thus, KP-derived NAD+ operates as a metabolic switch to specify macrophage effector responses. Breakdown of de novo NAD+ synthesis may underlie declining NAD+ levels and rising innate immune dysfunction in aging and age-associated diseases.

    View details for PubMedID 30478397

  • Dynamic Transcriptome Profiling Dataset of Vaccinia Virus Obtained from Long-read Sequencing Techniques. GigaScience Tombacz, D., Prazsak, I., Szucs, A., Denes, B., Snyder, M., Boldogkoi, Z. 2018

    Abstract

    Background: Poxviruses are large DNA viruses infecting humans and animals. Vaccinia virus (VACV) has been applied as a live vaccine for immunization against smallpox, which was eradicated by 1980 as a result of worldwide vaccination. VACV is the prototype of poxviruses in the investigation of the molecular pathogenesis of the virus. Short-read sequencing methods have revolutionized transcriptomics; but, they are not efficient in distinguishing between the RNA isoforms and transcript overlaps. Long-read sequencing (LRS) is much better suited to solve these problems, and also allow direct RNA sequencing. Despite the scientific relevance of VACV, no LRS data have been generated for the viral transcriptome so far.Findings: For the deep characterization of the VACV RNA profile, various LRS platforms and library preparation approaches were applied. The raw reads were mapped to the VACV reference genome and also to the host (Chlorocebus sabaeus) genome. In this study, we applied the Pacific Biosciences RSII and Sequel platforms, which altogether resulted in 937,531 mapped reads of inserts (1.42 Gb), while we obtained 2,160,348 aligned reads (1.75 Gb) from the different library preparation methods, using the MinION device from Oxford Nanopore Technologies.Conclusions: By applying cutting-edge technologies, we were able to generate a large dataset that can serve as a valuable resource for the investigation of the dynamic VACV transcriptome, the virus-host interactions and RNA base modifications. These data can provide useful information for novel gene annotations in the VACV genome. Our dataset can also be applied for analyzing the currently available LRS platforms, library preparation methods and bioinformatics pipelines.

    View details for PubMedID 30476066

  • Smooth Muscle Contact Drives Endothelial Regeneration by BMPR2-Notch1 Mediated Metabolic and Epigenetic Changes. Circulation research Miyagawa, K., Shi, M., Chen, P., Hennigs, J. K., Zhao, Z., Wang, M., Li, C. G., Saito, T., Taylor, S., Sa, S., Cao, A., Wang, L., Snyder, M. P., Rabinovitch, M. 2018

    Abstract

    RATIONALE: Maintaining endothelial cells (EC) as a monolayer in the vessel wall depends on their metabolic state and gene expression profile, features influenced by contact with neighboring cells such as pericytes and smooth muscle cells (SMC). Failure to regenerate a normal EC monolayer in response to injury can result in occlusive neointima formation in diseases such as atherosclerosis and pulmonary arterial hypertension.OBJECTIVE: We investigated the nature and functional importance of contact-dependent communication between SMC and EC to maintain EC integrity.METHODS AND RESULTS: We found that in SMC and EC contact co-cultures, bone morphogenetic protein receptor 2 (BMPR2) is required by both cell types to produce collagen IV to activate integrin-linked kinase. This enzyme directs phospho c-Jun N-terminal kinase (p-JNK) to the EC membrane, where it stabilizes presenilin1 and releases Notch1 intracellular domain (N1ICD) to promote EC proliferation. This response is necessary for EC regeneration following carotid artery injury. It is deficient in EC-SMC Bmpr2 double heterozygous mice in association with reduced collagen IV production, decreased N1ICD and attenuated EC proliferation, but can be rescued by targeting N1ICD to EC. Deletion of EC- Notch1 in transgenic mice worsens hypoxia-induced pulmonary hypertension, in association with impaired EC regenerative function associated with loss of pre-capillary arteries. We further determined that N1ICD maintains EC proliferative capacity by increasing mitochondrial mass and by inducing the phosphofructokinase PFKFB3. ChIP-seq analyses showed that PFKFB3 is required for citrate-dependent histone acetylation (H3K27) at enhancer sites of genes regulated by the acetyl transferase p300, and by N1ICD or the N1ICD target MYC and necessary for EC proliferation and homeostasis.CONCLUSIONS: Thus, SMC-EC contact is required for activation of Notch1 by BMPR2, to coordinate metabolism with chromatin remodeling of genes that enable EC regeneration, to maintain monolayer integrity and vascular homeostasis in response to injury.

    View details for PubMedID 30582451

  • Identification of phagocytosis regulators using magnetic genome-wide CRISPR screens. Nature genetics Haney, M. S., Bohlen, C. J., Morgens, D. W., Ousey, J. A., Barkal, A. A., Tsui, C. K., Ego, B. K., Levin, R., Kamber, R. A., Collins, H., Tucker, A., Li, A., Vorselen, D., Labitigan, L., Crane, E., Boyle, E., Jiang, L., Chan, J., Rincon, E., Greenleaf, W. J., Li, B., Snyder, M. P., Weissman, I. L., Theriot, J. A., Collins, S. R., Barres, B. A., Bassik, M. C. 2018

    Abstract

    Phagocytosis is required for a broad range of physiological functions, from pathogen defense to tissue homeostasis, but the mechanisms required for phagocytosis of diverse substrates remain incompletely understood. Here, we developed a rapid magnet-based phenotypic screening strategy, and performed eight genome-wide CRISPR screens in human cells to identify genes regulating phagocytosis of distinct substrates. After validating select hits in focused miniscreens, orthogonal assays and primary human macrophages, we show that (1) the previously uncharacterized gene NHLRC2 is a central player in phagocytosis, regulating RhoA-Rac1 signaling cascades that control actin polymerization and filopodia formation, (2) very-long-chain fatty acids are essential for efficient phagocytosis of certain substrates and (3) the previously uncharacterized Alzheimer's disease-associated gene TM2D3 can preferentially influence uptake of amyloid-beta aggregates. These findings illuminate new regulators and core principles of phagocytosis, and more generally establish an efficient method for unbiased identification of cellular uptake mechanisms across diverse physiological and pathological contexts.

    View details for PubMedID 30397336

  • Systematic Screening For Environmental And Behavioral Determinants Identifies Factors Detrimental to Skeletal Health Oei, L., Wu, J., Oei, E., Rivadeneira, F., Uitterlinden, A., Ioannidis, J., Snyder, M., Patel, C. WILEY. 2018: 279
  • Evaluation of whole exome sequencing as an alternative to BeadChip and whole genome sequencing in human population genetic analysis. BMC genomics Maroti, Z., Boldogkoi, Z., Tombacz, D., Snyder, M., Kalmar, T. 2018; 19 (1): 778

    Abstract

    BACKGROUND: Understanding the underlying genetic structure of human populations is of fundamental interest to both biological and social sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation. The most widely used methods for collecting variant information at the DNA-level include whole genome sequencing, which remains costly, and the more economical solution of array-based techniques, as these are capable of simultaneously genotyping a pre-selected set of variable DNA sites in the human genome. The largest publicly accessible set of human genomic sequence data available today originates from exome sequencing that comprises around 1.2% of the whole genome (approximately 30 million base pairs).RESULTS: To unbiasedly compare the effect of SNP selection strategies in population genetic analysis we subsampled the variants of the same highly curated 1K Genome dataset to mimic genome, exome sequencing and array data in order to eliminate the effect of different chemistry and error profiles of these different approaches. Next we compared the application of the exome dataset to the array-based dataset and to the gold standard whole genome dataset using the same population genetic analysis methods.CONCLUSIONS: Our results draw attention to some of the inherent problems that arise from using pre-selected SNP sets for population genetic analysis. Additionally, we demonstrate that exome sequencing provides a better alternative to the array-based methods for population genetic analysis. In this study, we propose a strategy for unbiased variant collection from exome data and offer a bioinformatics protocol for proper data processing.

    View details for PubMedID 30373510

  • Precision Medicine: Role of Proteomics in Changing Clinical Management and Care. Journal of proteome research Van Eyk, J. E., Snyder, M. P. 2018

    Abstract

    It is now possible to collect large sums of health-related data which has the potential to transform healthcare. Proteomics, with its central position as downstream of genetics and epigenetic inputs and upstream of biochemical outputs and integrators of environmental signals, is well-positioned to contribute to health discoveries and management. We present our perspective on the role of proteomics and other Omics in precision health and medicine.

    View details for PubMedID 30296097

  • Wearables and the medical revolution. Personalized medicine Dunn, J., Runge, R., Snyder, M. 2018

    Abstract

    Wearable sensors are already impacting healthcare and medicine by enabling health monitoring outside of the clinic and prediction of health events. This paper reviews current and prospective wearable technologies and their progress toward clinical application. We describe technologies underlying common, commercially available wearable sensors and early-stage devices and outline research, when available, to support the use of these devices in healthcare. We cover applications in the following health areas: metabolic, cardiovascular and gastrointestinal monitoring; sleep, neurology, movement disorders and mental health; maternal, pre- and neo-natal care; and pulmonary health and environmental exposures. Finally, we discuss challenges associated with the adoption of wearable sensors in the current healthcare ecosystem and discuss areas for future research and development.

    View details for PubMedID 30259801

  • Dual Platform Long-Read RNA-Sequencing Dataset of the Human Cytomegalovirus Lytic Transcriptome FRONTIERS IN GENETICS Balazs, Z., Tombacz, D., Szucs, A., Snyder, M., Boldogkoi, Z. 2018; 9
  • Dual Platform Long-Read RNA-Sequencing Dataset of the Human Cytomegalovirus Lytic Transcriptome. Frontiers in genetics Balázs, Z., Tombácz, D., Szűcs, A., Snyder, M., Boldogkői, Z. 2018; 9: 432

    View details for DOI 10.3389/fgene.2018.00432

    View details for PubMedID 30319694

    View details for PubMedCentralID PMC6170618

  • Disruption of mesoderm formation during cardiac differentiation due to developmental exposure to 13-cis-retinoic acid. Scientific reports Liu, Q., Van Bortle, K., Zhang, Y., Zhao, M., Zhang, J. Z., Geller, B. S., Gruber, J. J., Jiang, C., Wu, J. C., Snyder, M. P. 2018; 8 (1): 12960

    Abstract

    13-cis-retinoic acid (isotretinoin, INN) is an oral pharmaceutical drug used for the treatment of skin acne, and is also a known teratogen. In this study, the molecular mechanisms underlying INN-induced developmental toxicity during early cardiac differentiation were investigated using both human induced pluripotent stem cells (hiPSCs) and human embryonic stem cells (hESCs). Pre-exposure of hiPSCs and hESCs to a sublethal concentration of INN did not influence cell proliferation and pluripotency. However, mesodermal differentiation was disrupted when INN was included in the medium during differentiation. Transcriptomic profiling by RNA-seq revealed that INN exposure leads to aberrant expression of genes involved in several signaling pathways that control early mesoderm differentiation, such as TGF-beta signaling. In addition, genome-wide chromatin accessibility profiling by ATAC-seq suggested that INN-exposure leads to enhanced DNA-binding of specific transcription factors (TFs), including HNF1B, SOX10 and NFIC, often in close spatial proximity to genes that are dysregulated in response to INN treatment. Altogether, these results identify potential molecular mechanisms underlying INN-induced perturbation during mesodermal differentiation in the context of cardiac development. This study further highlights the utility of human stem cells as an alternative system for investigating congenital diseases of newborns that arise as a result of maternal drug exposure during pregnancy.

    View details for PubMedID 30154523

  • A Cloud-Based Metabolite and Chemical Prioritization System for the Biology/Disease-Driven Human Proteome Project. Journal of proteome research Yu, K., Lee, T. M., Chen, Y., Re, C., Kou, S. C., Chiang, J., Snyder, M., Kohane, I. S. 2018

    Abstract

    Targeted metabolomics and biochemical studies complement the ongoing investigations led by the Human Proteome Organization (HUPO) Biology/Disease-Driven Human Proteome Project (B/D-HPP). However, it is challenging to identify and prioritize metabolite and chemical targets. Literature-mining-based approaches have been proposed for target proteomics studies, but text mining methods for metabolite and chemical prioritization are hindered by a large number of synonyms and nonstandardized names of each entity. In this study, we developed a cloud-based literature mining and summarization platform that maps metabolites and chemicals in the literature to unique identifiers and summarizes the copublication trends of metabolites/chemicals and B/D-HPP topics using Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores. We successfully prioritized metabolites and chemicals associated with the B/D-HPP targeted fields and validated the results by checking against expert-curated associations and enrichment analyses. Compared with existing algorithms, our system achieved better precision and recall in retrieving chemicals related to B/D-HPP focused areas. Our cloud-based platform enables queries on all biological terms in multiple species, which will contribute to B/D-HPP and targeted metabolomics/chemical studies.

    View details for PubMedID 30094994

  • Long-Read Sequencing Revealed an Extensive Transcript Complexity in Herpesviruses FRONTIERS IN GENETICS Tombacz, D., Balazs, Z., Csabai, Z., Snyder, M., Boldogkoi, Z. 2018; 9
  • Long-Read Sequencing Revealed an Extensive Transcript Complexity in Herpesviruses. Frontiers in genetics Tombácz, D., Balázs, Z., Csabai, Z., Snyder, M., Boldogkői, Z. 2018; 9: 259

    Abstract

    Long-read sequencing (LRS) techniques are very recent advancements, but they have already been used for transcriptome research in all of the three subfamilies of herpesviruses. These techniques have multiplied the number of known transcripts in each of the examined viruses. Meanwhile, they have revealed a so far hidden complexity of the herpesvirus transcriptome with the discovery of a large number of novel RNA molecules, including coding and non-coding RNAs, as well as transcript isoforms, and polycistronic RNAs. Additionally, LRS techniques have uncovered an intricate meshwork of transcriptional overlaps between adjacent and distally located genes. Here, we review the contribution of LRS to herpesvirus transcriptomics and present the complexity revealed by this technology, while also discussing the functional significance of this phenomenon.

    View details for DOI 10.3389/fgene.2018.00259

    View details for PubMedID 30065753

    View details for PubMedCentralID PMC6056645

  • An integrated global regulatory network of hematopoietic precursor cell self-renewal and differentiation INTEGRATIVE BIOLOGY You, Y., Duran, R., Jiang, L., Dong, X., Zong, S., Snyder, M., Wu, J. 2018; 10 (7): 390–405

    Abstract

    Systematic study of the regulatory mechanisms of Hematopoietic Stem Cell and Progenitor Cell (HSPC) self-renewal is fundamentally important for understanding hematopoiesis and for manipulating HSPCs for therapeutic purposes. Previously, we have characterized gene expression and identified important transcription factors (TFs) regulating the switch between self-renewal and differentiation in a multipotent Hematopoietic Progenitor Cell (HPC) line, EML (Erythroid, Myeloid, and Lymphoid) cells. Herein, we report binding maps for additional TFs (SOX4 and STAT3) by using chromatin immunoprecipitation (ChIP)-Sequencing, to address the underlying mechanisms regulating self-renewal properties of lineage-CD34+ subpopulation (Lin-CD34+ EML cells). Furthermore, we applied the Assay for Transposase Accessible Chromatin (ATAC)-Sequencing to globally identify the open chromatin regions associated with TF binding in the self-renewing Lin-CD34+ EML cells. Mass spectrometry (MS) was also used to quantify protein relative expression levels. Finally, by integrating the protein-protein interaction database, we built an expanded transcriptional regulatory and interaction network. We found that MAPK (Mitogen-activated protein kinase) pathway and TGF-β/SMAD signaling pathway components were highly enriched among the binding targets of these TFs in Lin-CD34+ EML cells. The present study integrates regulatory information at multiple levels to paint a more comprehensive picture of the HSPC self-renewal mechanisms.

    View details for PubMedID 29892750

  • High Throughput Sequencing and Assessing Disease Risk. Cold Spring Harbor perspectives in medicine Rego, S. M., Snyder, M. P. 2018

    Abstract

    High-throughput sequencing has dramatically improved our ability to determine and diagnose the underlying causes of human disease. The use of whole-genome and whole-exome sequencing has facilitated faster and more cost-effective identification of new genes implicated in Mendelian disease. It has also improved our ability to identify disease-causing mutations for Mendelian diseases whose associated genes are already known. These benefits apply not only in cases in which the objective is to assess genetic disease risk in adults and children, but also for prenatal genetic testing and embryonic testing. High-throughput sequencing has also impacted our ability to assess risk for complex diseases and will likely continue to influence this area of disease research as more and more individuals undergo sequencing and we better understand the significance of variation, both rare and common, across the genome. Through these activities, high-throughput sequencing has the potential to revolutionize medicine.

    View details for PubMedID 29959131

  • Transcriptome-wide survey of pseudorabies virus using next- and third-generation sequencing platforms SCIENTIFIC DATA Tombacz, D., Sharon, D., Szucs, A., Moldovan, N., Snyder, M., Boldogkoi, Z. 2018; 5: 180119

    Abstract

    Pseudorabies virus (PRV) is an alphaherpesvirus of swine. PRV has a large double-stranded DNA genome and, as the latest investigations have revealed, a very complex transcriptome. Here, we present a large RNA-Seq dataset, derived from both short- and long-read sequencing. The dataset contains 1.3 million 100 bp paired-end reads that were obtained from the Illumina random-primed libraries, as well as 10 million 50 bp single-end reads generated by the Illumina polyA-seq. The Pacific Biosciences RSII non-amplified method yielded 57,021 reads of inserts (ROIs) aligned to the viral genome, the amplified method resulted in 158,396 PRV-specific ROIs, while we obtained 12,555 ROIs using the Sequel platform. The Oxford Nanopore's MinION device generated 44,006 reads using their regular cDNA-sequencing method, whereas 29,832 and 120,394 reads were produced by using the direct RNA-sequencing and the Cap-selection protocols, respectively. The raw reads were aligned to the PRV reference genome (KJ717942.1). Our provided dataset can be used to compare different sequencing approaches, library preparation methods, as well as for validation and testing bioinformatic pipelines.

    View details for PubMedID 29917014

  • Integrative omics for health and disease NATURE REVIEWS GENETICS Karczewski, K. J., Snyder, M. P. 2018; 19 (5): 299–310

    Abstract

    Advances in omics technologies - such as genomics, transcriptomics, proteomics and metabolomics - have begun to enable personalized medicine at an extraordinarily detailed molecular level. Individually, these technologies have contributed medical advances that have begun to enter clinical practice. However, each technology individually cannot capture the entire biological complexity of most human diseases. Integration of multiple technologies has emerged as an approach to provide a more comprehensive view of biology and disease. In this Review, we discuss the potential for combining diverse types of data and the utility of this approach in human health and disease. We provide examples of data integration to understand, diagnose and inform treatment of diseases, including rare and common diseases as well as cancer and transplant biology. Finally, we discuss technical and other challenges to clinical implementation of integrative omics.

    View details for PubMedID 29479082

    View details for PubMedCentralID PMC5990367

  • Personal Omics for Precision Health CIRCULATION RESEARCH Kellogg, R. A., Dunn, J., Snyder, M. P. 2018; 122 (9): 1169–71

    View details for PubMedID 29700064

  • Fast Metagenomic Binning via Hashing and Bayesian Clustering JOURNAL OF COMPUTATIONAL BIOLOGY Popic, V., Kuleshov, V., Snyder, M., Batzoglou, S. 2018

    Abstract

    We introduce GATTACA, a framework for fast unsupervised binning of metagenomic contigs. Similar to recent approaches, GATTACA clusters contigs based on their coverage profiles across a large cohort of metagenomic samples; however, unlike previous methods that rely on read mapping, GATTACA quickly estimates these profiles from kmer counts stored in a compact index. This approach can result in over an order of magnitude speedup, while matching the accuracy of earlier methods on synthetic and real data benchmarks. It also provides a way to index metagenomic samples (e.g., from public repositories such as the Human Microbiome Project) offline once and reuse them across experiments; furthermore, the small size of the sample indices allows them to be easily transferred and stored. Leveraging the MinHash technique, GATTACA also provides an efficient way to identify publicly available metagenomic data that can be incorporated into the set of reference metagenomes to further improve binning accuracy. Thus, enabling easy indexing and reuse of publicly available metagenomic data sets, GATTACA makes accurate metagenomic analyses accessible to a much wider range of researchers.

    View details for PubMedID 29658784

  • Distinct transcriptomic and exomic abnormalities within myelodysplastic syndrome marrow cells. Leukemia & lymphoma Im, H., Rao, V., Sridhar, K., Bentley, J., Mishra, T., Chen, R., Hall, J., Graber, A., Zhang, Y., Li, X., Mias, G. I., Snyder, M. P., Greenberg, P. L. 2018: 1-11

    Abstract

    To provide biologic insights into mechanisms underlying myelodysplastic syndromes (MDS) we evaluated the CD34+ marrow cells transcriptome using high-throughput RNA sequencing (RNA-Seq). We demonstrated significant differential gene expression profiles (GEPs) between MDS and normal and identified 41 disease classifier genes. Additionally, two main clusters of GEPs distinguished patients based on their major clinical features, particularly between those whose disease remained stable versus patients who transformed into acute myeloid leukemia within 12 months. The genes whose expression was associated with disease outcome were involved in functional pathways and biologic processes highly relevant for MDS. Combined with exomic analysis we identified differential isoform usage of genes in MDS mutational subgroups, with consequent dysregulation of distinct biologic functions. This combination of clinical, transcriptomic and exomic findings provides valuable understanding of mechanisms underlying MDS and its progression to a more aggressive stage and also facilitates prognostic characterization of MDS patients.

    View details for DOI 10.1080/10428194.2018.1452210

    View details for PubMedID 29616851

  • A global transcriptional network connecting noncoding mutations to changes in tumor gene expression NATURE GENETICS Zhang, W., Bojorquez-Gomez, A., Velez, D., Xu, G., Sanchez, K. S., Shen, J., Chen, K., Licon, K., Melton, C., Olson, K. M., Yu, M., Huang, J. K., Carter, H., Farley, E. K., Snyder, M., Fraley, S. I., Kreisberg, J. F., Ideker, T. 2018; 50 (4): 613-+

    Abstract

    Although cancer genomes are replete with noncoding mutations, the effects of these mutations remain poorly characterized. Here we perform an integrative analysis of 930 tumor whole genomes and matched transcriptomes, identifying a network of 193 noncoding loci in which mutations disrupt target gene expression. These 'somatic eQTLs' (expression quantitative trait loci) are frequently mutated in specific cancer tissues, and the majority can be validated in an independent cohort of 3,382 tumors. Among these, we find that the effects of noncoding mutations on DAAM1, MTG2 and HYI transcription are recapitulated in multiple cancer cell lines and that increasing DAAM1 expression leads to invasive cell migration. Collectively, the noncoding loci converge on a set of core pathways, permitting a classification of tumors into pathway-based subtypes. The somatic eQTL network is disrupted in 88% of tumors, suggesting widespread impact of noncoding mutations in cancer.

    View details for PubMedID 29610481

    View details for PubMedCentralID PMC5893414

  • NF90/ILF3 is a transcription factor that promotes proliferation over differentiation by hierarchical regulation in K562 erythroleukemia cells PLOS ONE Wu, T., Shi, L., Adrian, J., Shi, M., Nair, R. V., Snyder, M. P., Kao, P. N. 2018; 13 (3): e0193126

    Abstract

    NF90 and splice variant NF110 are DNA- and RNA-binding proteins encoded by the Interleukin enhancer-binding factor 3 (ILF3) gene that have been established to regulate RNA splicing, stabilization and export. The roles of NF90 and NF110 in regulating transcription as chromatin-interacting proteins have not been comprehensively characterized. Here, chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) identified 9,081 genomic sites specifically occupied by NF90/NF110 in K562 cells. One third of NF90/NF110 peaks occurred at promoters of annotated genes. NF90/NF110 occupancy colocalized with chromatin marks associated with active promoters and strong enhancers. Comparison with 150 ENCODE ChIP-seq experiments revealed that NF90/NF110 clustered with transcription factors exhibiting preference for promoters over enhancers (POLR2A, MYC, YY1). Differential gene expression analysis following shRNA knockdown of NF90/NF110 in K562 cells revealed that NF90/NF110 activates transcription factors that drive growth and proliferation (EGR1, MYC), while attenuating differentiation along the erythroid lineage (KLF1). NF90/NF110 associates with chromatin to hierarchically regulate transcription factors that promote proliferation and suppress differentiation.

    View details for PubMedID 29590119

  • Circular DNA elements of chromosomal origin are common in healthy human somatic tissue NATURE COMMUNICATIONS Moller, H., Mohiyuddin, M., Prada-Luengo, I., Sailani, M., Halling, J., Plomgaard, P., Maretty, L., Hansen, A., Snyder, M. P., Pilegaard, H., Lam, H. K., Regenberg, B. 2018; 9: 1069

    Abstract

    The human genome is generally organized into stable chromosomes, and only tumor cells are known to accumulate kilobase (kb)-sized extrachromosomal circular DNA elements (eccDNAs). However, it must be expected that kb eccDNAs exist in normal cells as a result of mutations. Here, we purify and sequence eccDNAs from muscle and blood samples from 16 healthy men, detecting ~100,000 unique eccDNA types from 16 million nuclei. Half of these structures carry genes or gene fragments and the majority are smaller than 25 kb. Transcription from eccDNAs suggests that eccDNAs reside in nuclei and recurrence of certain eccDNAs in several individuals implies DNA circularization hotspots. Gene-rich chromosomes contribute to more eccDNAs per megabase and the most transcribed protein-coding gene in muscle, TTN (titin), provides the most eccDNAs per gene. Thus, somatic genomes are rich in chromosome-derived eccDNAs that may influence phenotypes through altered gene copy numbers and transcription of full-length or truncated genes.

    View details for PubMedID 29540679

  • An Integrated Understanding of the Rapid Metabolic Benefits of a Carbohydrate-Restricted Diet on Hepatic Steatosis in Humans CELL METABOLISM Mardinoglu, A., Wu, H., Bjornson, E., Zhang, C., Hakkarainen, A., Rasanen, S. M., Lee, S., Mancina, R. M., Bergentall, M., Pietilainen, K. H., Soderlund, S., Matikainen, N., Stahlman, M., Bergh, P., Adiels, M., Piening, B. D., Graner, M., Lundbom, N., Williams, K. J., Romeo, S., Nielsen, J., Snyder, M., Uhlen, M., Bergstrom, G., Perkins, R., Marschall, H., Backhed, F., Taskinen, M., Boren, J. 2018; 27 (3): 559-+

    Abstract

    A carbohydrate-restricted diet is a widely recommended intervention for non-alcoholic fatty liver disease (NAFLD), but a systematic perspective on the multiple benefits of this diet is lacking. Here, we performed a short-term intervention with an isocaloric low-carbohydrate diet with increased protein content in obese subjects with NAFLD and characterized the resulting alterations in metabolism and the gut microbiota using a multi-omics approach. We observed rapid and dramatic reductions of liver fat and other cardiometabolic risk factors paralleled by (1) marked decreases in hepatic de novo lipogenesis; (2) large increases in serum β-hydroxybutyrate concentrations, reflecting increased mitochondrial β-oxidation; and (3) rapid increases in folate-producing Streptococcus and serum folate concentrations. Liver transcriptomic analysis on biopsy samples from a second cohort revealed downregulation of the fatty acid synthesis pathway and upregulation of folate-mediated one-carbon metabolism and fatty acid oxidation pathways. Our results highlight the potential of exploring diet-microbiota interactions for treating NAFLD.

    View details for PubMedID 29456073

  • Biallelic Mutations in ATP5F1D, which Encodes a Subunit of ATP Synthase, Cause a Metabolic Disorder AMERICAN JOURNAL OF HUMAN GENETICS Olahova, M., Yoon, W., Thompson, K., Jangam, S., Fernandez, L., Davidson, J. M., Kyle, J. E., Grove, M. E., Fisk, D. G., Kohler, J. N., Holmes, M., Dries, A. M., Huang, Y., Zhao, C., Contrepois, K., Zappala, Z., Fresard, L., Waggott, D., Zink, E. M., Kim, Y., Heyman, H. M., Stratton, K. G., Webb-Robertson, B. M., Snyder, M., Merker, J. D., Montgomery, S. B., Fisher, P. G., Feichtinger, R. G., Mayr, J. A., Hall, J., Barbosa, I. A., Simpson, M. A., Deshpande, C., Waters, K. M., Koeller, D. M., Metz, T. O., Morris, A. A., Schelley, S., Cowan, T., Friederich, M. W., McFarland, R., Van Hove, J. K., Enns, G. M., Yamamoto, S., Ashley, E. A., Wangler, M. F., Taylor, R. W., Bellen, H. J., Bernstein, J. A., Wheeler, M. T., Undiagnosed Diseases Network 2018; 102 (3): 494–504

    Abstract

    ATP synthase, H+ transporting, mitochondrial F1 complex, δ subunit (ATP5F1D; formerly ATP5D) is a subunit of mitochondrial ATP synthase and plays an important role in coupling proton translocation and ATP production. Here, we describe two individuals, each with homozygous missense variants in ATP5F1D, who presented with episodic lethargy, metabolic acidosis, 3-methylglutaconic aciduria, and hyperammonemia. Subject 1, homozygous for c.245C>T (p.Pro82Leu), presented with recurrent metabolic decompensation starting in the neonatal period, and subject 2, homozygous for c.317T>G (p.Val106Gly), presented with acute encephalopathy in childhood. Cultured skin fibroblasts from these individuals exhibited impaired assembly of F1FO ATP synthase and subsequent reduced complex V activity. Cells from subject 1 also exhibited a significant decrease in mitochondrial cristae. Knockdown of Drosophila ATPsynδ, the ATP5F1D homolog, in developing eyes and brains caused a near complete loss of the fly head, a phenotype that was fully rescued by wild-type human ATP5F1D. In contrast, expression of the ATP5F1D c.245C>T and c.317T>G variants rescued the head-size phenotype but recapitulated the eye and antennae defects seen in other genetic models of mitochondrial oxidative phosphorylation deficiency. Our data establish c.245C>T (p.Pro82Leu) and c.317T>G (p.Val106Gly) in ATP5F1D as pathogenic variants leading to a Mendelian mitochondrial disease featuring episodic metabolic decompensation.

    View details for PubMedID 29478781

  • Full Genome Sequence of the Western Reserve Strain of Vaccinia Virus Determined by Third-Generation Sequencing MICROBIOLOGY RESOURCE ANNOUNCEMENTS Prazsak, I., Tombacz, D., Szucs, A., Denes, B., Snyder, M., Boldogkoi, Z. 2018; 6 (11)

    Abstract

    The vaccinia virus is a large, complex virus belonging to the Poxviridae family. Here, we report the complete, annotated genome sequence of the neurovirulent Western Reserve laboratory strain of this virus, which was sequenced on the Pacific Biosciences RS II and Oxford Nanopore MinION platforms.

    View details for PubMedID 29545308

  • Applying genomics in heart transplantation TRANSPLANT INTERNATIONAL Keating, B. J., Pereira, A. C., Snyder, M., Piening, B. D. 2018; 31 (3): 278–90

    Abstract

    While advances in patient care and immunosuppressive pharmacotherapies have increased the lifespan of heart allograft recipients, there are still significant comorbidities post-transplantation and 5-year survival rates are still significant, at approximately 70%. The last decade has seen massive strides in genomics and other omics fields, including transcriptomics, with many of these advances now starting to impact heart transplant clinical care. This review summarizes a number of the key advances in genomics which are relevant for heart transplant outcomes, and we highlight the translational potential that such knowledge may bring to patient care within the next decade.

    View details for PubMedID 29363220

    View details for PubMedCentralID PMC5990370

  • Multiplatform next-generation sequencing identifies novel RNA molecules and transcript isoforms of the endogenous retrovirus isolated from cultured cells FEMS MICROBIOLOGY LETTERS Moldovan, N., Szucs, A., Tombacz, D., Balazs, Z., Csabai, Z., Snyder, M., Boldogkoi, Z. 2018; 365 (5)

    Abstract

    In this study, we applied short- and long-read RNA sequencing techniques, as well as PCR analysis to investigate the transcriptome of the porcine endogenous retrovirus (PERV) expressed from cultured porcine kidney cell line PK-15. This analysis has revealed six novel transcripts and eight transcript isoforms, including five length and three splice variants. We were able to establish whether a deletion in a transcript is the result of the splicing of mRNAs or of genomic deletion in one of the PERV clones. Additionally, we re-annotated the formerly identified RNA molecules. Our analysis revealed a higher complexity of PERV transcriptome than it was earlier believed.

    View details for PubMedID 29361122

  • Multi-Platform Sequencing Approach Reveals a Novel Transcriptome Profile in Pseudorabies Virus FRONTIERS IN MICROBIOLOGY Moldovan, N., Tombacz, D., Szucs, A., Csabai, Z., Snyder, M., Boldogkoi, Z. 2018; 8
  • Omics AnalySIs System for PRecision Oncology (OASISPRO): a web-based omics analysis tool for clinical phenotype prediction BIOINFORMATICS Yu, K., Fitzpatrick, M. R., Pappas, L., Chan, W., Kung, J., Snyder, M. 2018; 34 (2): 319–20

    Abstract

    Precision oncology is an approach that accounts for individual differences to guide cancer management. Omics signatures have been shown to predict clinical traits for cancer patients. However, the vast amount of omics information poses an informatics challenge in systematically identifying patterns associated with health outcomes, and no general-purpose data-mining tool exists for physicians, medical researchers, and citizen scientists without significant training in programming and bioinformatics. To bridge this gap, we built the Omics AnalySIs System for PRecision Oncology (OASISPRO), a web-based system to mine the quantitative omics information from The Cancer Genome Atlas (TCGA). This system effectively visualizes patients' clinical profiles, executes machine-learning algorithms of choice on the omics data, and evaluates the prediction performance using held-out test sets. With this tool, we successfully identified genes strongly associated with tumor stage, and accurately predicted patients' survival outcomes in many cancer types, including mesothelioma and adrenocortical carcinoma. By identifying the links between omics and clinical phenotypes, this system will facilitate omics studies on precision cancer medicine and contribute to establishing personalized cancer treatment plans.This web-based tool is available at http://tinyurl.com/oasispro ;source codes are available at http://tinyurl.com/oasisproSourceCode .

    View details for PubMedID 28968749

    View details for PubMedCentralID PMC5860203

  • How many human proteoforms are there? Nature chemical biology Aebersold, R. n., Agar, J. N., Amster, I. J., Baker, M. S., Bertozzi, C. R., Boja, E. S., Costello, C. E., Cravatt, B. F., Fenselau, C. n., Garcia, B. A., Ge, Y. n., Gunawardena, J. n., Hendrickson, R. C., Hergenrother, P. J., Huber, C. G., Ivanov, A. R., Jensen, O. N., Jewett, M. C., Kelleher, N. L., Kiessling, L. L., Krogan, N. J., Larsen, M. R., Loo, J. A., Ogorzalek Loo, R. R., Lundberg, E. n., MacCoss, M. J., Mallick, P. n., Mootha, V. K., Mrksich, M. n., Muir, T. W., Patrie, S. M., Pesavento, J. J., Pitteri, S. J., Rodriguez, H. n., Saghatelian, A. n., Sandoval, W. n., Schlüter, H. n., Sechi, S. n., Slavoff, S. A., Smith, L. M., Snyder, M. P., Thomas, P. M., Uhlén, M. n., Van Eyk, J. E., Vidal, M. n., Walt, D. R., White, F. M., Williams, E. R., Wohlschlager, T. n., Wysocki, V. H., Yates, N. A., Young, N. L., Zhang, B. n. 2018; 14 (3): 206–14

    Abstract

    Despite decades of accumulated knowledge about proteins and their post-translational modifications (PTMs), numerous questions remain regarding their molecular composition and biological function. One of the most fundamental queries is the extent to which the combinations of DNA-, RNA- and PTM-level variations explode the complexity of the human proteome. Here, we outline what we know from current databases and measurement strategies including mass spectrometry-based proteomics. In doing so, we examine prevailing notions about the number of modifications displayed on human proteins and how they combine to generate the protein diversity underlying health and disease. We frame central issues regarding determination of protein-level variation and PTMs, including some paradoxes present in the field today. We use this framework to assess existing data and to ask the question, "How many distinct primary structures of proteins (proteoforms) are created from the 20,300 human genes?" We also explore prospects for improving measurements to better regularize protein-level biology and efficiently associate PTMs to function and phenotype.

    View details for PubMedID 29443976

  • Functional regulatory mechanism of smooth muscle cell-restricted LMOD1 coronary artery disease locus. PLoS genetics Nanda, V. n., Wang, T. n., Pjanic, M. n., Liu, B. n., Nguyen, T. n., Matic, L. P., Hedin, U. n., Koplev, S. n., Ma, L. n., Franzén, O. n., Ruusalepp, A. n., Schadt, E. E., Björkegren, J. L., Montgomery, S. B., Snyder, M. P., Quertermous, T. n., Leeper, N. J., Miller, C. L. 2018; 14 (11): e1007755

    Abstract

    Recent genome-wide association studies (GWAS) have identified multiple new loci which appear to alter coronary artery disease (CAD) risk via arterial wall-specific mechanisms. One of the annotated genes encodes LMOD1 (Leiomodin 1), a member of the actin filament nucleator family that is highly enriched in smooth muscle-containing tissues such as the artery wall. However, it is still unknown whether LMOD1 is the causal gene at this locus and also how the associated variants alter LMOD1 expression/function and CAD risk. Using epigenomic profiling we recently identified a non-coding regulatory variant, rs34091558, which is in tight linkage disequilibrium (LD) with the lead CAD GWAS variant, rs2820315. Herein we demonstrate through expression quantitative trait loci (eQTL) and statistical fine-mapping in GTEx, STARNET, and human coronary artery smooth muscle cell (HCASMC) datasets, rs34091558 is the top regulatory variant for LMOD1 in vascular tissues. Position weight matrix (PWM) analyses identify the protective allele rs34091558-TA to form a conserved Forkhead box O3 (FOXO3) binding motif, which is disrupted by the risk allele rs34091558-A. FOXO3 chromatin immunoprecipitation and reporter assays show reduced FOXO3 binding and LMOD1 transcriptional activity by the risk allele, consistent with effects of FOXO3 downregulation on LMOD1. LMOD1 knockdown results in increased proliferation and migration and decreased cell contraction in HCASMC, and immunostaining in atherosclerotic lesions in the SMC lineage tracing reporter mouse support a key role for LMOD1 in maintaining the differentiated SMC phenotype. These results provide compelling functional evidence that genetic variation is associated with dysregulated LMOD1 expression/function in SMCs, together contributing to the heritable risk for CAD.

    View details for PubMedID 30444878

  • Distinct Transcriptomic and Exomic Abnormalities Within Myelodysplastic Syndrome Marrow Cells Leukemia & Lymphoma Im, H., Rao, V., Sridhar, K., Bentley, J., Mishra, T., Chen, R., Hall, J., Graber, A., Zhang, Y., Xiao, L., Mias, G., Snyder, M. P., Greenberg, P. L. 2018: 1-11
  • SETD7 Drives Cardiac Lineage Commitment through Stage-Specific Transcriptional Activation. Cell stem cell Lee, J. n., Shao, N. Y., Paik, D. T., Wu, H. n., Guo, H. n., Termglinchan, V. n., Churko, J. M., Kim, Y. n., Kitani, T. n., Zhao, M. T., Zhang, Y. n., Wilson, K. D., Karakikes, I. n., Snyder, M. P., Wu, J. C. 2018; 22 (3): 428–44.e5

    Abstract

    Cardiac development requires coordinated and large-scale rearrangements of the epigenome. The roles and precise mechanisms through which specific epigenetic modifying enzymes control cardiac lineage specification, however, remain unclear. Here we show that the H3K4 methyltransferase SETD7 controls cardiac differentiation by reading H3K36 marks independently of its enzymatic activity. Through chromatin immunoprecipitation sequencing (ChIP-seq), we found that SETD7 targets distinct sets of genes to drive their stage-specific expression during cardiomyocyte differentiation. SETD7 associates with different co-factors at these stages, including SWI/SNF chromatin-remodeling factors during mesodermal formation and the transcription factor NKX2.5 in cardiac progenitors to drive their differentiation. Further analyses revealed that SETD7 binds methylated H3K36 in the bodies of its target genes to facilitate RNA polymerase II (Pol II)-dependent transcription. Moreover, abnormal SETD7 expression impairs functional attributes of terminally differentiated cardiomyocytes. Together, these results reveal how SETD7 acts at sequential steps in cardiac lineage commitment, and they provide insights into crosstalk between dynamic epigenetic marks and chromatin-modifying enzymes.

    View details for PubMedID 29499155

  • Value of Circulating Cytokine Profiling During Submaximal Exercise Testing in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Scientific reports Moneghetti, K. J., Skhiri, M. n., Contrepois, K. n., Kobayashi, Y. n., Maecker, H. n., Davis, M. n., Snyder, M. n., Haddad, F. n., Montoya, J. G. 2018; 8 (1): 2779

    Abstract

    Myalgic Encephalomyelitis or Chronic Fatigue Syndrome (ME/CFS) is a heterogeneous syndrome in which patients often experience severe fatigue and malaise following exertion. Immune and cardiovascular dysfunction have been postulated to play a role in the pathophysiology. We therefore, examined whether cytokine profiling or cardiovascular testing following exercise would differentiate patients with ME/CFS. Twenty-four ME/CFS patients were matched to 24 sedentary controls and underwent cardiovascular and circulating immune profiling. Cardiovascular analysis included echocardiography, cardiopulmonary exercise and endothelial function testing. Cytokine and growth factor profiles were analyzed using a 51-plex Luminex bead kit at baseline and 18 hours following exercise. Cardiac structure and exercise capacity were similar between groups. Sparse partial least square discriminant analyses of cytokine profiles 18 hours post exercise offered the most reliable discrimination between ME/CFS and controls (κ = 0.62(0.34,0.84)). The most discriminatory cytokines post exercise were CD40L, platelet activator inhibitor, interleukin 1-β, interferon-α and CXCL1. In conclusion, cytokine profiling following exercise may help differentiate patients with ME/CFS from sedentary controls.

    View details for PubMedID 29426834

  • Long-read sequencing of the human cytomegalovirus transcriptome with the Pacific Biosciences RSII platform SCIENTIFIC DATA Balazs, Z., Tombacz, D., Szucs, A., Snyder, M., Boldogkoi, Z. 2017; 4: 170194

    Abstract

    Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.

    View details for PubMedID 29257134

  • Challenges and recommendations for epigenomics in precision health NATURE BIOTECHNOLOGY Carter, A. C., Chang, H. Y., Church, G., Dombkowski, A., Ecker, J. R., Gil, E., Giresi, P. G., Greely, H., Greenleaf, W. J., Hacohen, N., He, C., Hill, D., Ko, J., Kohane, I., Kundaje, A., Palmer, M., Snyder, M. P., Tung, J., Urban, A., Vidal, M., Wong, W. 2017; 35 (12): 1128–32

    View details for PubMedID 29220033

  • Cloud-based interactive analytics for terabytes of genomic variants data. Bioinformatics (Oxford, England) Pan, C., McInnes, G., Deflaux, N., Snyder, M., Bingham, J., Datta, S., Tsao, P. S. 2017; 33 (23): 3709-3715

    Abstract

    Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired.We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information.Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs.cuiping@stanford.edu or ptsao@stanford.edu.Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btx468

    View details for PubMedID 28961771

    View details for PubMedCentralID PMC5860318

  • Long-Read Sequencing of Human Cytomegalovirus Transcriptome Reveals RNA Isoforms Carrying Distinct Coding Potentials SCIENTIFIC REPORTS Balazs, Z., Tombacz, D., Szucs, A., Csabai, Z., Megyeri, K., Petrov, A. N., Snyder, M., Boldogkoi, Z. 2017; 7: 15989

    Abstract

    The human cytomegalovirus (HCMV) is a ubiquitous, human pathogenic herpesvirus. The complete viral genome is transcriptionally active during infection; however, a large part of its transcriptome has yet to be annotated. In this work, we applied the amplified isoform sequencing technique from Pacific Biosciences to characterize the lytic transcriptome of HCMV strain Towne varS. We developed a pipeline for transcript annotation using long-read sequencing data. We identified 248 transcriptional start sites, 116 transcriptional termination sites and 80 splicing events. Using this information, we have annotated 291 previously undescribed or only partially annotated transcript isoforms, including eight novel antisense transcripts and their isoforms, as well as a novel transcript (RS2) in the short repeat region, partially antisense to RS1. Similarly to other organisms, we discovered a high transcriptional diversity in HCMV, with many transcripts only slightly differing from one another. Comparing our transcriptome profiling results to an earlier ribosome footprint analysis, we have concluded that the majority of the transcripts contain multiple translationally active ORFs, and also that most isoforms contain unique combinations of ORFs. Based on these results, we propose that one important function of this transcriptional diversity may be to provide a regulatory mechanism at the level of translation.

    View details for PubMedID 29167532

  • Transcriptomic and epigenomic differences in human induced pluripotent stem cells generated from six reprogramming methods NATURE BIOMEDICAL ENGINEERING Churko, J. M., Lee, J., Ameen, M., Gu, M., Venkatasubramanian, M., Diecke, S., Sallam, K., Im, H., Wang, G., Gold, J. D., Salomonis, N., Snyder, M. P., Wu, J. C. 2017; 1 (10): 826–37
  • Transcriptomic and epigenomic differences in human induced pluripotent stem cells generated from six reprogramming methods. Nature biomedical engineering Churko, J. M., Lee, J., Ameen, M., Gu, M., Venkatasubramanian, M., Diecke, S., Sallam, K., Im, H., Wang, G., Gold, J. D., Salomonis, N., Snyder, M. P., Wu, J. C. 2017; 1 (10): 826-837

    Abstract

    Many reprogramming methods can generate human induced pluripotent stem cells (hiPSCs) that closely resemble human embryonic stem cells (hESCs). This has led to assessments of how similar hiPSCs are to hESCs, by evaluating differences in gene expression, epigenetic marks and differentiation potential. However, all previous studies were performed using hiPSCs acquired from different laboratories, passage numbers, culturing conditions, genetic backgrounds and reprogramming methods, all of which may contribute to the reported differences. Here, by using high-throughput sequencing under standardized cell culturing conditions and passage number, we compare the epigenetic signatures (H3K4me3, H3K27me3 and HDAC2 ChIP-seq profiles) and transcriptome differences (by RNA-seq) of hiPSCs generated from the same primary fibroblast population by using six different reprogramming methods. We found that the reprogramming method impacts the resulting transcriptome and that all hiPSC lines could terminally differentiate, regardless of the reprogramming method. Moreover, by comparing the differences between the hiPSC and hESC lines, we observed a significant proportion of differentially expressed genes that could be attributed to polycomb repressive complex targets.

    View details for DOI 10.1038/s41551-017-0141-6

    View details for PubMedID 30263871

    View details for PubMedCentralID PMC6155993

  • Long-Read Sequencing Reveals a GC Pressure during the Evolution of Porcine Endogenous Retrovirus MICROBIOLOGY RESOURCE ANNOUNCEMENTS Szucs, A., Moldovan, N., Tombacz, D., Csabai, Z., Snyder, M., Boldogkoi, Z. 2017; 5 (40)

    Abstract

    Here, we present the complete genome sequence of a porcine endogenous retrovirus determined by Pacific Biosciences sequencing. A comparison of the genome of this isolate with those of other strains revealed the operation of a mechanism resulting in the selective accumulation of G and C bases in the viral DNA.

    View details for PubMedID 28982996

  • Novel nonsense gain-of-function NFKB2 mutations associated with a combined immunodeficiency phenotype BLOOD Kuehn, H., Niemela, J. E., Sreedhara, K., Stoddard, J. L., Grossman, J., Wysocki, C. A., de la Morena, M., Garofalo, M., Inlora, J., Snyder, M. P., Lewis, D. B., Stratakis, C. A., Fleisher, T. A., Rosenzweig, S. D. 2017; 130 (13): 1553–64

    Abstract

    NF-κB signaling through its NFKB1-dependent canonical and NFKB2-dependent noncanonical pathways plays distinctive roles in a diverse range of immune processes. Recently, mutations in these 2 genes have been associated with common variable immunodeficiency (CVID). While studying patients with genetically uncharacterized primary immunodeficiencies, we detected 2 novel nonsense gain-of-function (GOF) NFKB2 mutations (E418X and R635X) in 3 patients from 2 families, and a novel missense change (S866R) in another patient. Their immunophenotype was assessed by flow cytometry and protein expression; activation of canonical and noncanonical pathways was examined in peripheral blood mononuclear cells and transfected HEK293T cells through immunoblotting, immunohistochemistry, luciferase activity, real-time polymerase chain reaction, and multiplex assays. The S866R change disrupted a C-terminal NF-κΒ2 critical site affecting protein phosphorylation and nuclear translocation, resulting in CVID with adrenocorticotropic hormone deficiency, growth hormone deficiency, and mild ectodermal dysplasia as previously described. In contrast, the nonsense mutations E418X and R635X observed in 3 patients led to constitutive nuclear localization and activation of both canonical and noncanonical NF-κΒ pathways, resulting in a combined immunodeficiency (CID) without endocrine or ectodermal manifestations. These changes were also found in 2 asymptomatic relatives. Thus, these novel NFKB2 GOF mutations produce a nonfully penetrant CID phenotype through a different pathophysiologic mechanism than previously described for mutations in NFKB2.

    View details for PubMedID 28778864

    View details for PubMedCentralID PMC5620416

  • Evaluation of the impact of ul54 gene-deletion on the global transcription and DNA replication of pseudorabies virus ARCHIVES OF VIROLOGY Csabai, Z., Takacs, I. F., Snyder, M., Boldogkoi, Z., Tombacz, D. 2017; 162 (9): 2679–94

    Abstract

    Pseudorabies virus (PRV) is an animal alphaherpesvirus with a wide host range. PRV has 67 protein-coding genes and several non-coding RNA molecules, which can be classified into three temporal groups, immediate early, early and late classes. The ul54 gene of PRV and its homolog icp27 of herpes simplex virus have a multitude of functions, including the regulation of viral DNA synthesis and the control of the gene expression. Therefore, abrogation of PRV ul54 function was expected to exert a significant effect on the global transcriptome and on DNA replication. Real-time PCR and real-time RT-PCR platforms were used to investigate these presumed effects. Our analyses revealed a drastic impact of the ul54 mutation on the genome-wide expression of PRV genes, especially on the transcription of the true late genes. A more than two hour delay was observed in the onset of DNA replication, and the amount of synthesized DNA molecules was significantly decreased in comparison to the wild-type virus. Furthermore, in this work, we were able to successfully demonstrate the utility of long-read SMRT sequencing for genotyping of mutant viruses.

    View details for PubMedID 28577213

    View details for PubMedCentralID PMC5927779

  • High-Coverage Whole-Exome Sequencing Identifies Candidate Genes for Suicide in Victims with Major Depressive Disorder SCIENTIFIC REPORTS Tombacz, D., Maroti, Z., Kalmar, T., Csabai, Z., Balazs, Z., Takahashi, S., Palkovits, M., Snyder, M., Boldogkoi, Z. 2017; 7: 7106

    Abstract

    We carried out whole-exome ultra-high throughput sequencing in brain samples of suicide victims who had suffered from major depressive disorder and control subjects who had died from other causes. This study aimed to reveal the selective accumulation of rare variants in the coding and the UTR sequences within the genes of suicide victims. We also analysed the potential effect of STR and CNV variations, as well as the infection of the brain with neurovirulent viruses in this behavioural disorder. As a result, we have identified several candidate genes, among others three calcium channel genes that may potentially contribute to completed suicide. We also explored the potential implication of the TGF-β signalling pathway in the pathogenesis of suicidal behaviour. To our best knowledge, this is the first study that uses whole-exome sequencing for the investigation of suicide.

    View details for PubMedID 28769055

  • Network analyses identify liver-specific targets for treating liver diseases MOLECULAR SYSTEMS BIOLOGY Lee, S., Zhang, C., Liu, Z., Klevstig, M., Mukhopadhyay, B., Bergentall, M., Cinar, R., Stahlman, M., Sikanic, N., Park, J. K., Deshmukh, S., Harzandi, A. M., Kuijpers, T., Grotli, M., Elsasser, S. J., Piening, B. D., Snyder, M., Smith, U., Nielsen, J., Backhed, F., Kunos, G., Uhlen, M., Boren, J., Mardinoglu, A. 2017; 13 (8): 938

    Abstract

    We performed integrative network analyses to identify targets that can be used for effectively treating liver diseases with minimal side effects. We first generated co-expression networks (CNs) for 46 human tissues and liver cancer to explore the functional relationships between genes and examined the overlap between functional and physical interactions. Since increased de novo lipogenesis is a characteristic of nonalcoholic fatty liver disease (NAFLD) and hepatocellular carcinoma (HCC), we investigated the liver-specific genes co-expressed with fatty acid synthase (FASN). CN analyses predicted that inhibition of these liver-specific genes decreases FASN expression. Experiments in human cancer cell lines, mouse liver samples, and primary human hepatocytes validated our predictions by demonstrating functional relationships between these liver genes, and showing that their inhibition decreases cell growth and liver fat content. In conclusion, we identified liver-specific genes linked to NAFLD pathogenesis, such as pyruvate kinase liver and red blood cell (PKLR), or to HCC pathogenesis, such as PKLR, patatin-like phospholipase domain containing 3 (PNPLA3), and proprotein convertase subtilisin/kexin type 9 (PCSK9), all of which are potential targets for drug development.

    View details for PubMedID 28827398

  • A Droplet Microfluidics Based Platform for Mining Metagenomic Libraries for Natural Compounds MICROMACHINES Theodorou, E., Scanga, R., Twardowski, M., Snyder, M. P., Brouzes, E. 2017; 8 (8)

    Abstract

    Historically, microbes from the environment have been a reliable source for novel bio-active compounds. Cloning and expression of metagenomic DNA in heterologous strains of bacteria has broadened the range of potential compounds accessible. However, such metagenomic libraries have been under-exploited for applications in mammalian cells because of a lack of integrated methods. We present an innovative platform to systematically mine natural resources for pro-apoptotic compounds that relies on the combination of bacterial delivery and droplet microfluidics. Using the violacein operon from C. violaceum as a model, we demonstrate that E. coli modified to be invasive can serve as an efficient delivery vehicle of natural compounds. This approach permits the seamless screening of metagenomic libraries with mammalian cell assays and alleviates the need for laborious extraction of natural compounds. In addition, we leverage the unique properties of droplet microfluidics to amplify bacterial clones and perform clonal screening at high-throughput in place of one-compound-per-well assays in multi-well format. We also use droplet microfluidics to establish a cell aggregate strategy that overcomes the issue of background apoptosis. Altogether, this work forms the foundation of a versatile platform to efficiently mine the metagenome for compounds with therapeutic potential.

    View details for PubMedID 30400422

  • Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis SCIENTIFIC REPORTS Ma, S., Snyder, M., Dinesh-Kumar, S. P. 2017; 7: 5557

    Abstract

    Deciphering gene regulatory networks requires identification of gene expression modules. We describe a novel bottom-up approach to identify gene modules regulated by cis-regulatory motifs from a human gene co-expression network. Target genes of a cis-regulatory motif were identified from the network via the motif's enrichment or biased distribution towards transcription start sites in the promoters of co-expressed genes. A gene sub-network containing the target genes was extracted and used to derive gene modules. The analysis revealed known and novel gene modules regulated by the NF-Y motif. The binding of NF-Y proteins to these modules' gene promoters were verified using ENCODE ChIP-Seq data. The analyses also identified 8,048 Sp1 motif target genes, interestingly many of which were not detected by ENCODE ChIP-Seq. These target genes assemble into house-keeping, tissues-specific developmental, and immune response modules. Integration of Sp1 modules with genomic and epigenomic data indicates epigenetic control of Sp1 targets' expression in a cell/tissue specific manner. Finally, known and novel target genes and modules regulated by the YY1, RFX1, IRF1, and 34 other motifs were also identified. The study described here provides a valuable resource to understand transcriptional regulation of various human developmental, disease, or immunity pathways.

    View details for PubMedID 28717181

  • Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis NATURE COMMUNICATIONS Sahraeian, S., Mohiyuddin, M., Sebra, R., Tilgner, H., Afshar, P. T., Au, K., Asadi, N., Gerstein, M. B., Wong, W., Snyder, M. P., Schadt, E., Lam, H. K. 2017; 8: 59

    Abstract

    RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.

    View details for PubMedID 28680106

  • Long-Read Isoform Sequencing Reveals a Hidden Complexity of the Transcriptional Landscape of Herpes Simplex Virus Type 1 FRONTIERS IN MICROBIOLOGY Tombacz, D., Csabai, Z., Szuca, A., Balazs, Z., Moldovan, N., Sharon, D., Snyder, M., Boldogkoi, Z. 2017; 8: 1079

    Abstract

    In this study, we used the amplified isoform sequencing technique from Pacific Biosciences to characterize the poly(A)+ fraction of the lytic transcriptome of the herpes simplex virus type 1 (HSV-1). Our analysis detected 34 formerly unidentified protein-coding genes, 10 non-coding RNAs, as well as 17 polycistronic and complex transcripts. This work also led us to identify many transcript isoforms, including 13 splice and 68 transcript end variants, as well as several transcript overlaps. Additionally, we determined previously unascertained transcriptional start and polyadenylation sites. We analyzed the transcriptional activity from the complementary DNA strand in five convergent HSV gene pairs with quantitative RT-PCR and detected antisense RNAs in each gene. This part of the study revealed an inverse correlation between the expressions of convergent partners. Our work adds new insights for understanding the complexity of the pervasive transcriptional overlaps by suggesting that there is a crosstalk between adjacent and distal genes through interaction between their transcription apparatuses. We also identified transcripts overlapping the HSV replication origins, which may indicate an interplay between the transcription and replication machineries. The relative abundance of HSV-1 transcripts has also been established by using a novel method based on the calculation of sequencing reads for the analysis.

    View details for DOI 10.3389/fmicb.2017.01079

    View details for Web of Science ID 000403758800001

    View details for PubMedID 28676792

    View details for PubMedCentralID PMC5476775

  • Bisulfite-independent analysis of CpG island methylation enables genome-scale stratification of single cells NUCLEIC ACIDS RESEARCH Han, L., Wu, H., Zhu, H., Kim, K., Marjani, S. L., Riester, M., Euskirchen, G., Zi, X., Yang, J., Han, J., Snyder, M., Park, I., Irizarry, R., Weissman, S. M., Michor, F., Fan, R., Pan, X. 2017; 45 (10): e77

    Abstract

    Conventional DNA bisulfite sequencing has been extended to single cell level, but the coverage consistency is insufficient for parallel comparison. Here we report a novel method for genome-wide CpG island (CGI) methylation sequencing for single cells (scCGI-seq), combining methylation-sensitive restriction enzyme digestion and multiple displacement amplification for selective detection of methylated CGIs. We applied this method to analyzing single cells from two types of hematopoietic cells, K562 and GM12878 and small populations of fibroblasts and induced pluripotent stem cells. The method detected 21 798 CGIs (76% of all CGIs) per cell, and the number of CGIs consistently detected from all 16 profiled single cells was 20 864 (72.7%), with 12 961 promoters covered. This coverage represents a substantial improvement over results obtained using single cell reduced representation bisulfite sequencing, with a 66-fold increase in the fraction of consistently profiled CGIs across individual cells. Single cells of the same type were more similar to each other than to other types, but also displayed epigenetic heterogeneity. The method was further validated by comparing the CpG methylation pattern, methylation profile of CGIs/promoters and repeat regions and 41 classes of known regulatory markers to the ENCODE data. Although not every minor methylation differences between cells are detectable, scCGI-seq provides a solid tool for unsupervised stratification of a heterogeneous cell population.

    View details for PubMedID 28126923

    View details for PubMedCentralID PMC5605247

  • Isolated Congenital Anosmia and CNGA2 Mutation. Scientific reports Sailani, M. R., Jingga, I., MirMazlomi, S. H., Bitarafan, F., Bernstein, J. A., Snyder, M. P., Garshasbi, M. 2017; 7 (1): 2667-?

    Abstract

    Isolated congenital anosmia (ICA) is a rare condition that is associated with life-long inability to smell. Here we report a genetic characterization of a large Iranian family segregating ICA. Whole exome sequencing in five affected family members and five healthy members revealed a stop gain mutation in CNGA2 (OMIM 300338) (chrX:150,911,102; CNGA2. c.577C > T; p.Arg193*). The mutation segregates in an X-linked pattern, as all the affected family members are hemizygotes, whereas healthy family members are either heterozygote or homozygote for the reference allele. cnga2 knockout mice are congenitally anosmic and have abnormal olfactory system physiology, additionally Karstensen et al. recently reported two anosmic brothers sharing a CNGA2 truncating variant. Our study in concert with these findings provides strong support for role of CNGA2 gene with pathogenicity of ICA in humans. Together, these results indicate that mutations in key olfactory signaling pathway genes are responsible for human disease.

    View details for DOI 10.1038/s41598-017-02947-y

    View details for PubMedID 28572688

  • Succinate and its G-protein-coupled receptor stimulates osteoclastogenesis. Nature communications Guo, Y., Xie, C., Li, X., Yang, J., Yu, T., Zhang, R., Zhang, T., Saxena, D., Snyder, M., Wu, Y., Li, X. 2017; 8: 15621-?

    Abstract

    The mechanism underlying bone impairment in patients with diabetes mellitus, a metabolic disorder characterized by chronic hyperglycaemia and dysregulation in metabolism, is unclear. Here we show the difference in the metabolomics of bone marrow stromal cells (BMSCs) derived from hyperglycaemic (type 2 diabetes mellitus, T2D) and normoglycaemic mice. One hundred and forty-two metabolites are substantially regulated in BMSCs from T2D mice, with the tricarboxylic acid (TCA) cycle being one of the primary metabolic pathways impaired by hyperglycaemia. Importantly, succinate, an intermediate metabolite in the TCA cycle, is increased by 24-fold in BMSCs from T2D mice. Succinate functions as an extracellular ligand through binding to its specific receptor on osteoclastic lineage cells and stimulates osteoclastogenesis in vitro and in vivo. Strategies targeting the receptor activation inhibit osteoclastogenesis. This study reveals a metabolite-mediated mechanism of osteoclastogenesis modulation that contributes to bone dysregulation in metabolic disorders.

    View details for DOI 10.1038/ncomms15621

    View details for PubMedID 28561074

  • Multi-platform analysis reveals a complex transcriptome architecture of a circovirus. Virus research Moldován, N., Balázs, Z., Tombácz, D., Csabai, Z., Szucs, A., Snyder, M., Boldogkoi, Z. 2017; 237: 37-46

    Abstract

    In this study, we used Pacific Biosciences RS II long-read and Illumina HiScanSQ short-read sequencing technologies for the characterization of porcine circovirus type 1 (PCV-1) transcripts. Our aim was to identify novel RNA molecules and transcript isoforms, as well as to determine the exact 5'- and 3'-end sequences of previously described transcripts with single base-pair accuracy. We discovered a novel 3'-UTR length isoform of the Cap transcript, and a non-spliced Cap transcript variant. Additionally, our analysis has revealed a 3'-UTR isoform of Rep and two 5'-UTR isoforms of Rep' transcripts, and a novel splice variant of the longer Rep' transcript. We also explored two novel long transcripts, one with a previously identified splice site, and a formerly undetected mRNA of ORF3. Altogether, our methods have identified nine novel RNA molecules, doubling the size of PCV-1 transcriptome that had been known before. Additionally, our investigations revealed an intricate pattern of transcript overlapping, which might produce transcriptional interference between the transcriptional machineries of adjacent genes, and thereby may potentially play a role in the regulation of gene expression in circoviruses.

    View details for DOI 10.1016/j.virusres.2017.05.010

    View details for PubMedID 28549855

  • Non-equivalence of Wnt and R-spondin ligands during Lgr5(+) intestinal stem-cell self-renewal NATURE Yan, K. S., Janda, C. Y., Chang, J., Zheng, G. X., Larkin, K. A., Luca, V. C., Chia, L. A., Mah, A. T., Han, A., Terry, J. M., Ootani, A., Roelf, K., Lee, M., Yuan, J., Li, X., Bolen, C. R., Wilhelmy, J., Davies, P. S., Ueno, H., von Furstenberg, R. J., Belgrader, P., Ziraldo, S. B., Ordonez, H., Henning, S. J., Wong, M. H., Snyder, M. P., Weissman, I. L., Hsueh, A. J., Mikkelsen, T. S., Garcia, K. C., Kuo, C. J. 2017; 545 (7653): 238-?

    Abstract

    The canonical Wnt/β-catenin signalling pathway governs diverse developmental, homeostatic and pathological processes. Palmitoylated Wnt ligands engage cell-surface frizzled (FZD) receptors and LRP5 and LRP6 co-receptors, enabling β-catenin nuclear translocation and TCF/LEF-dependent gene transactivation. Mutations in Wnt downstream signalling components have revealed diverse functions thought to be carried out by Wnt ligands themselves. However, redundancy between the 19 mammalian Wnt proteins and 10 FZD receptors and Wnt hydrophobicity have made it difficult to attribute these functions directly to Wnt ligands. For example, individual mutations in Wnt ligands have not revealed homeostatic phenotypes in the intestinal epithelium-an archetypal canonical, Wnt pathway-dependent, rapidly self-renewing tissue, the regeneration of which is fueled by proliferative crypt Lgr5(+) intestinal stem cells (ISCs). R-spondin ligands (RSPO1-RSPO4) engage distinct LGR4-LGR6, RNF43 and ZNRF3 receptor classes, markedly potentiate canonical Wnt/β-catenin signalling, and induce intestinal organoid growth in vitro and Lgr5(+) ISCs in vivo. However, the interchangeability, functional cooperation and relative contributions of Wnt versus RSPO ligands to in vivo canonical Wnt signalling and ISC biology remain unknown. Here we identify the functional roles of Wnt and RSPO ligands in the intestinal crypt stem-cell niche. We show that the default fate of Lgr5(+) ISCs is to differentiate, unless both RSPO and Wnt ligands are present. However, gain-of-function studies using RSPO ligands and a new non-lipidated Wnt analogue reveal that these ligands have qualitatively distinct, non-interchangeable roles in ISCs. Wnt proteins are unable to induce Lgr5(+) ISC self-renewal, but instead confer a basal competency by maintaining RSPO receptor expression that enables RSPO ligands to actively drive and specify the extent of stem-cell expansion. This functionally non-equivalent yet cooperative interaction between Wnt and RSPO ligands establishes a molecular precedent for regulation of mammalian stem cells by distinct priming and self-renewal factors, with broad implications for precise control of tissue regeneration.

    View details for DOI 10.1038/nature22313

    View details for Web of Science ID 000400963800037

  • intestinal stem-cell self-renewal. Nature Yan, K. S., Janda, C. Y., Chang, J., Zheng, G. X., Larkin, K. A., Luca, V. C., Chia, L. A., Mah, A. T., Han, A., Terry, J. M., Ootani, A., Roelf, K., Lee, M., Yuan, J., Li, X., Bolen, C. R., Wilhelmy, J., Davies, P. S., Ueno, H., von Furstenberg, R. J., Belgrader, P., Ziraldo, S. B., Ordonez, H., Henning, S. J., Wong, M. H., Snyder, M. P., Weissman, I. L., Hsueh, A. J., Mikkelsen, T. S., Garcia, K. C., Kuo, C. J. 2017; 545 (7653): 238-242

    Abstract

    The canonical Wnt/β-catenin signalling pathway governs diverse developmental, homeostatic and pathological processes. Palmitoylated Wnt ligands engage cell-surface frizzled (FZD) receptors and LRP5 and LRP6 co-receptors, enabling β-catenin nuclear translocation and TCF/LEF-dependent gene transactivation. Mutations in Wnt downstream signalling components have revealed diverse functions thought to be carried out by Wnt ligands themselves. However, redundancy between the 19 mammalian Wnt proteins and 10 FZD receptors and Wnt hydrophobicity have made it difficult to attribute these functions directly to Wnt ligands. For example, individual mutations in Wnt ligands have not revealed homeostatic phenotypes in the intestinal epithelium-an archetypal canonical, Wnt pathway-dependent, rapidly self-renewing tissue, the regeneration of which is fueled by proliferative crypt Lgr5(+) intestinal stem cells (ISCs). R-spondin ligands (RSPO1-RSPO4) engage distinct LGR4-LGR6, RNF43 and ZNRF3 receptor classes, markedly potentiate canonical Wnt/β-catenin signalling, and induce intestinal organoid growth in vitro and Lgr5(+) ISCs in vivo. However, the interchangeability, functional cooperation and relative contributions of Wnt versus RSPO ligands to in vivo canonical Wnt signalling and ISC biology remain unknown. Here we identify the functional roles of Wnt and RSPO ligands in the intestinal crypt stem-cell niche. We show that the default fate of Lgr5(+) ISCs is to differentiate, unless both RSPO and Wnt ligands are present. However, gain-of-function studies using RSPO ligands and a new non-lipidated Wnt analogue reveal that these ligands have qualitatively distinct, non-interchangeable roles in ISCs. Wnt proteins are unable to induce Lgr5(+) ISC self-renewal, but instead confer a basal competency by maintaining RSPO receptor expression that enables RSPO ligands to actively drive and specify the extent of stem-cell expansion. This functionally non-equivalent yet cooperative interaction between Wnt and RSPO ligands establishes a molecular precedent for regulation of mammalian stem cells by distinct priming and self-renewal factors, with broad implications for precise control of tissue regeneration.

    View details for DOI 10.1038/nature22313

    View details for PubMedID 28467820

  • Histone variant H2A.J accumulates in senescent cells and promotes inflammatory gene expression NATURE COMMUNICATIONS Contrepois, K., Coudereau, C., Benayoun, B. A., Schuler, N., Roux, P., Bischof, O., Courbeyrette, R., Carvalho, C., Thuret, J., Ma, Z., Derbois, C., Nevers, M., Volland, H., Redon, C. E., Bonner, W. M., Deleuze, J., Wiel, C., Bernard, D., Snyder, M. P., Ruebe, C. E., Olaso, R., Fenaille, F., Mann, C. 2017; 8

    Abstract

    The senescence of mammalian cells is characterized by a proliferative arrest in response to stress and the expression of an inflammatory phenotype. Here we show that histone H2A.J, a poorly studied H2A variant found only in mammals, accumulates in human fibroblasts in senescence with persistent DNA damage. H2A.J also accumulates in mice with aging in a tissue-specific manner and in human skin. Knock-down of H2A.J inhibits the expression of inflammatory genes that contribute to the senescent-associated secretory phenotype (SASP), and over expression of H2A.J increases the expression of some of these genes in proliferating cells. H2A.J accumulation may thus promote the signalling of senescent cells to the immune system, and it may contribute to chronic inflammation and the development of aging-associated diseases.

    View details for DOI 10.1038/ncomms14995

    View details for Web of Science ID 000400886800001

    View details for PubMedID 28489069

  • Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens NATURE COMMUNICATIONS Morgens, D. W., Wainberg, M., Boyle, E. A., Ursu, O., Araya, C. L., Tsui, C. K., Haney, M. S., Hess, G. T., Han, K., Jeng, E. E., Li, A., Snyder, M. P., Greenleaf, W. J., Kundaje, A., Bassik, M. C. 2017; 8

    Abstract

    CRISPR-Cas9 screens are powerful tools for high-throughput interrogation of genome function, but can be confounded by nuclease-induced toxicity at both on- and off-target sites, likely due to DNA damage. Here, to test potential solutions to this issue, we design and analyse a CRISPR-Cas9 library with 10 variable-length guides per gene and thousands of negative controls targeting non-functional, non-genic regions (termed safe-targeting guides), in addition to non-targeting controls. We find this library has excellent performance in identifying genes affecting growth and sensitivity to the ricin toxin. The safe-targeting guides allow for proper control of toxicity from on-target DNA damage. Using this toxicity as a proxy to measure off-target cutting, we demonstrate with tens of thousands of guides both the nucleotide position-dependent sensitivity to single mismatches and the reduction of off-target cutting using truncated guides. Our results demonstrate a simple strategy for high-throughput evaluation of target specificity and nuclease toxicity in Cas9 screens.

    View details for DOI 10.1038/ncomms15178

    View details for PubMedID 28474669

  • A Case Report of Hypoglycemia and Hypogammaglobulinemia: DAVID syndrome in a patient with a novel NFKB2 mutation. journal of clinical endocrinology and metabolism Lal, R. A., Bachrach, L. K., Hoffman, A. R., Inlora, J., Rego, S., Snyder, M. P., Lewis, D. B. 2017

    Abstract

    DAVID syndrome (Deficient Anterior pituitary with Variable Immune Deficiency) is a rare disorder in which children present with symptomatic ACTH deficiency preceded by hypogammaglobulinemia from B-cell dysfunction with recurrent infections, termed common variable immunodeficiency (CVID). Subsequent whole exome sequencing studies have revealed germline heterozygous C-terminal mutations of NFKB2 as either a cause of DAVID syndrome or of CVID without clinical hypopituitarism. However, to the best of our knowledge there have been no cases in which the endocrinopathy has presented in the absence of a prior clinical history of CVID.A previously healthy 7 year-old boy with no history of clinical immunodeficiency, presented with profound hypoglycemia and seizures. He was found to have secondary adrenal insufficiency and was started on glucocorticoid replacement. An evaluation for autoimmune disease, including for anti-pituitary antibodies, was negative. Evaluation unexpectedly revealed hypogammaglobulinemia (decreased IgG, IgM, and IgA). He had moderately reduced serotype-specific IgG responses following pneumococcal polysaccharide vaccine. Subsequently, he was found to have growth hormone (GH) deficiency. Six years after initial presentation, whole exome sequencing revealed a novel de novo heterozygous NFKB2 missense mutation c.2596A>C (p.Ser866Arg) in the C-terminal region predicted to abrogate the processing of the p100 NFKB2 protein to its active p52 form.Isolated early-onset ACTH deficiency is rare and C-terminal region NFKB2 mutations should be considered as an etiology even in the absence of a clinical history of CVID. Early immunologic evaluation is indicated in the diagnosis and management of isolated ACTH deficiency.

    View details for DOI 10.1210/jc.2017-00341

    View details for PubMedID 28472507

  • Patient-Specific iPSC-Derived Endothelial Cells Uncover Pathways that Protect against Pulmonary Hypertension in BMPR2 Mutation Carriers CELL STEM CELL Gu, M., Shao, N., Sa, S., Li, D., Termglinchan, V., Ameen, M., Karakikes, I., Sosa, G., Grubert, F., Lee, J., Cao, A., Taylor, S., Ma, Y., Zhao, Z., Chappell, J., Hamid, R., Austin, E. D., Gold, J. D., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2017; 20 (4): 490-?
  • Gpr124 is essential for blood-brain barrier integrity in central nervous system disease NATURE MEDICINE Chang, J., Mancuso, M. R., Maier, C., Liang, X., Yuki, K., Yang, L., Kwong, J. W., Wang, J., Rao, V., Vallon, M., Kosinski, C., Zhang, J. J., Mah, A. T., Xu, L., Li, L., Gholamin, S., Reyes, T. F., Li, R., Kuhnert, F., Han, X., Yuan, J., Chiou, S., Brettman, A. D., Daly, L., Corney, D. C., Cheshier, S. H., Shortliffe, L. D., Wu, X., Snyder, M., Chan, P., Giffard, R. G., Chang, H. Y., Andreasson, K., Kuo, C. J. 2017; 23 (4): 450-?

    Abstract

    Although blood-brain barrier (BBB) compromise is central to the etiology of diverse central nervous system (CNS) disorders, endothelial receptor proteins that control BBB function are poorly defined. The endothelial G-protein-coupled receptor (GPCR) Gpr124 has been reported to be required for normal forebrain angiogenesis and BBB function in mouse embryos, but the role of this receptor in adult animals is unknown. Here Gpr124 conditional knockout (CKO) in the endothelia of adult mice did not affect homeostatic BBB integrity, but resulted in BBB disruption and microvascular hemorrhage in mouse models of both ischemic stroke and glioblastoma, accompanied by reduced cerebrovascular canonical Wnt-β-catenin signaling. Constitutive activation of Wnt-β-catenin signaling fully corrected the BBB disruption and hemorrhage defects of Gpr124-CKO mice, with rescue of the endothelial gene tight junction, pericyte coverage and extracellular-matrix deficits. We thus identify Gpr124 as an endothelial GPCR specifically required for endothelial Wnt signaling and BBB integrity under pathological conditions in adult mice. This finding implicates Gpr124 as a potential therapeutic target for human CNS disorders characterized by BBB disruption.

    View details for DOI 10.1038/nm.4309

    View details for PubMedID 28288111

  • Induced Pluripotent Stem Cell Model of Pulmonary Arterial Hypertension Reveals Novel Gene Expression and Patient Specificity AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE Sa, S., Gu, M., Chappe, J., Shao, N., Ameen, M., Elliott, K. A., Li, D., Grubert, F., Li, C. G., Taylor, S., Cao, A., Ma, Y., Fong, R., Nguyen, L., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2017; 195 (7): 930-941
  • Characterization of the Dynamic Transcriptome of a Herpesvirus with Long-read Single Molecule Real-Time Sequencing. Scientific reports Tombácz, D., Balázs, Z., Csabai, Z., Moldován, N., Szucs, A., Sharon, D., Snyder, M., Boldogkoi, Z. 2017; 7: 43751-?

    Abstract

    Herpesvirus gene expression is co-ordinately regulated and sequentially ordered during productive infection. The viral genes can be classified into three distinct kinetic groups: immediate-early, early, and late classes. In this study, a massively parallel sequencing technique that is based on PacBio Single Molecule Real-time sequencing platform, was used for quantifying the poly(A) fraction of the lytic transcriptome of pseudorabies virus (PRV) throughout a 12-hour interval of productive infection on PK-15 cells. Other approaches, including microarray, real-time RT-PCR and Illumina sequencing are capable of detecting only the aggregate transcriptional activity of particular genomic regions, but not individual herpesvirus transcripts. However, SMRT sequencing allows for a distinction between transcript isoforms, including length- and splice variants, as well as between overlapping polycistronic RNA molecules. The non-amplified Isoform Sequencing (Iso-Seq) method was used to analyse the kinetic properties of the lytic PRV transcripts and to then classify them accordingly. Additionally, the present study demonstrates the general utility of long-read sequencing for the time-course analysis of global gene expression in practically any organism.

    View details for DOI 10.1038/srep43751

    View details for PubMedID 28256586

    View details for PubMedCentralID PMC5335617

  • Association of AHSG with alopecia and mental retardation (APMR) syndrome. Human genetics Reza Sailani, M., Jahanbani, F., Nasiri, J., Behnam, M., Salehi, M., Sedghi, M., Hoseinzadeh, M., Takahashi, S., Zia, A., Gruber, J., Lynch, J. L., Lam, D., Winkelmann, J., Amirkiai, S., Pang, B., Rego, S., Mazroui, S., Bernstein, J. A., Snyder, M. P. 2017; 136 (3): 287-296

    Abstract

    Alopecia with mental retardation syndrome (APMR) is a very rare autosomal recessive condition that is associated with total or partial absence of hair from the scalp and other parts of the body as well as variable intellectual disability. Here we present whole-exome sequencing results of a large consanguineous family segregating APMR syndrome with seven affected family members. Our study revealed a novel predicted pathogenic, homozygous missense mutation in the AHSG (OMIM 138680) gene (AHSG: NM_001622:exon7:c.950G>A:p.Arg317His). The variant is predicted to affect a region of the protein required for protein processing and disrupts a phosphorylation motif. In addition, the altered protein migrates with an aberrant size relative to healthy individuals. Consistent with the phenotype, AHSG maps within APMR linkage region 1 (APMR 1) as reported before, and falls within runs of homozygosity (ROH). Previous families with APMR syndrome have been studied through linkage analyses and the linkage resolution did not allow pointing out to a single gene candidate. Our study is the first report to identify a homozygous missense mutation for APMR syndrome through whole-exome sequencing.

    View details for DOI 10.1007/s00439-016-1756-5

    View details for PubMedID 28054173

  • A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and N-1-methyladenosine modification RNA Cenik, C., Chua, H. N., Singh, G., Akef, A., Snyder, M. P., Palazzo, A. F., Moore, M. J., Roth, F. P. 2017; 23 (3): 270-283

    Abstract

    Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5' proximal-intron-minus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N(1)-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N(1)-methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC.

    View details for DOI 10.1261/rna.059105.116.

    View details for Web of Science ID 000394467500002

    View details for PubMedCentralID PMC5311483

  • -methyladenosine modification. RNA (New York, N.Y.) Cenik, C., Chua, H. N., Singh, G., Akef, A., Snyder, M. P., Palazzo, A. F., Moore, M. J., Roth, F. P. 2017; 23 (3): 270-283

    Abstract

    Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5' proximal-intron-minus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N(1)-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N(1)-methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC.

    View details for DOI 10.1261/rna.059105.116

    View details for PubMedID 27994090

    View details for PubMedCentralID PMC5311483

  • Single cell transcriptomics reveals unanticipated features of early hematopoietic precursors. Nucleic acids research Yang, J., Tanaka, Y., Seay, M., Li, Z., Jin, J., Garmire, L. X., Zhu, X., Taylor, A., Li, W., Euskirchen, G., Halene, S., Kluger, Y., Snyder, M. P., Park, I. H., Pan, X., Weissman, S. M. 2017; 45 (3): 1281-1296

    Abstract

    Molecular changes underlying stem cell differentiation are of fundamental interest. scRNA-seq on murine hematopoietic stem cells (HSC) and their progeny MPP1 separated the cells into 3 main clusters with distinct features: active, quiescent, and an un-characterized cluster. Induction of anemia resulted in mobilization of the quiescent to the active cluster and of the early to later stage of cell cycle, with marked increase in expression of certain transcription factors (TFs) while maintaining expression of interferon response genes. Cells with surface markers of long term HSC increased the expression of a group of TFs expressed highly in normal cycling MPP1 cells. However, at least Id1 and Hes1 were significantly activated in both HSC and MPP1 cells in anemic mice. Lineage-specific genes were differently expressed between cells, and correlated with the cell cycle stages with a specific augmentation of erythroid related genes in the G2/M phase. Most lineage specific TFs were stochastically expressed in the early precursor cells, but a few, such as Klf1, were detected only at very low levels in few precursor cells. The activation of these factors may correlate with stages of differentiation. This study reveals effects of cell cycle progression on the expression of lineage specific genes in precursor cells, and suggests that hematopoietic stress changes the balance of renewal and differentiation in these homeostatic cells.

    View details for DOI 10.1093/nar/gkw1214

    View details for PubMedID 28003475

    View details for PubMedCentralID PMC5388401

  • Single cell transcriptomics reveals unanticipated features of early hematopoietic precursors NUCLEIC ACIDS RESEARCH Yang, J., Tanaka, Y., Seay, M., Li, Z., Jin, J., Garmire, L. X., Zhu, X., Taylor, A., Li, W., Euskirchen, G., Halene, S., Kluger, Y., Snyder, M. P., Park, I., Pan, X., Weissman, S. M. 2017; 45 (3): 1281-1296

    Abstract

    Molecular changes underlying stem cell differentiation are of fundamental interest. scRNA-seq on murine hematopoietic stem cells (HSC) and their progeny MPP1 separated the cells into 3 main clusters with distinct features: active, quiescent, and an un-characterized cluster. Induction of anemia resulted in mobilization of the quiescent to the active cluster and of the early to later stage of cell cycle, with marked increase in expression of certain transcription factors (TFs) while maintaining expression of interferon response genes. Cells with surface markers of long term HSC increased the expression of a group of TFs expressed highly in normal cycling MPP1 cells. However, at least Id1 and Hes1 were significantly activated in both HSC and MPP1 cells in anemic mice. Lineage-specific genes were differently expressed between cells, and correlated with the cell cycle stages with a specific augmentation of erythroid related genes in the G2/M phase. Most lineage specific TFs were stochastically expressed in the early precursor cells, but a few, such as Klf1, were detected only at very low levels in few precursor cells. The activation of these factors may correlate with stages of differentiation. This study reveals effects of cell cycle progression on the expression of lineage specific genes in precursor cells, and suggests that hematopoietic stress changes the balance of renewal and differentiation in these homeostatic cells.

    View details for DOI 10.1093/nar/gkw1214

    View details for Web of Science ID 000397008000025

    View details for PubMedCentralID PMC5388401

  • Genetic Adaptation of Porcine Circovirus Type 1 to Cultured Porcine Kidney Cells Revealed by Single-Molecule Long-Read Sequencing Technology MICROBIOLOGY RESOURCE ANNOUNCEMENTS Tombacz, D., Moldovan, N., Balazs, Z., Csabai, Z., Snyder, M., Boldogkoi, Z. 2017; 5 (5)

    Abstract

    Porcine circovirus type 1 (PCV1) is a nonpathogenic circovirus, and a contaminant of the porcine kidney (PK-15) cell line. We present the complete and annotated genome sequence of strain Szeged of PCV1, determined by Pacific Biosciences RSII long-read sequencing platform.

    View details for PubMedID 28153895

  • Multi-Platform Sequencing Approach Reveals a Novel Transcriptome Profile in Pseudorabies Virus. Frontiers in microbiology Moldován, N., Tombácz, D., Szűcs, A., Csabai, Z., Snyder, M., Boldogkői, Z. 2017; 8: 2708

    Abstract

    Third-generation sequencing is an emerging technology that is capable of solving several problems that earlier approaches were not able to, including the identification of transcripts isoforms and overlapping transcripts. In this study, we used long-read sequencing for the analysis of pseudorabies virus (PRV) transcriptome, including Oxford Nanopore Technologies MinION, PacBio RS-II, and Illumina HiScanSQ platforms. We also used data from our previous short-read and long-read sequencing studies for the comparison of the results and in order to confirm the obtained data. Our investigations identified 19 formerly unknown putative protein-coding genes, all of which are 5' truncated forms of earlier annotated longer PRV genes. Additionally, we detected 19 non-coding RNAs, including 5' and 3' truncated transcripts without in-frame ORFs, antisense RNAs, as well as RNA molecules encoded by those parts of the viral genome where no transcription had been detected before. This study has also led to the identification of three complex transcripts and 50 distinct length isoforms, including transcription start and end variants. We also detected 121 novel transcript overlaps, and two transcripts that overlap the replication origins of PRV. Furthermore, in silico analysis revealed 145 upstream ORFs, many of which are located on the longer 5' isoforms of the transcripts.

    View details for DOI 10.3389/fmicb.2017.02708

    View details for PubMedID 29403453

    View details for PubMedCentralID PMC5786565

  • Pharmacological rescue of diabetic skeletal stem cell niches. Science translational medicine Tevlin, R., Seo, E. Y., Marecic, O., McArdle, A., Tong, X., Zimdahl, B., Malkovskiy, A., Sinha, R., Gulati, G., Li, X., Wearda, T., Morganti, R., Lopez, M., Ransom, R. C., Duldulao, C. R., Rodrigues, M., Nguyen, A., Januszyk, M., Maan, Z., Paik, K., Yapa, K., Rajadas, J., Wan, D. C., Gurtner, G. C., Snyder, M., Beachy, P. A., Yang, F., Goodman, S. B., Weissman, I. L., Chan, C. K., Longaker, M. T. 2017; 9 (372)

    Abstract

    Diabetes mellitus (DM) is a metabolic disease frequently associated with impaired bone healing. Despite its increasing prevalence worldwide, the molecular etiology of DM-linked skeletal complications remains poorly defined. Using advanced stem cell characterization techniques, we analyzed intrinsic and extrinsic determinants of mouse skeletal stem cell (mSSC) function to identify specific mSSC niche-related abnormalities that could impair skeletal repair in diabetic (Db) mice. We discovered that high serum concentrations of tumor necrosis factor-α directly repressed the expression of Indian hedgehog (Ihh) in mSSCs and in their downstream skeletogenic progenitors in Db mice. When hedgehog signaling was inhibited during fracture repair, injury-induced mSSC expansion was suppressed, resulting in impaired healing. We reversed this deficiency by precise delivery of purified Ihh to the fracture site via a specially formulated, slow-release hydrogel. In the presence of exogenous Ihh, the injury-induced expansion and osteogenic potential of mSSCs were restored, culminating in the rescue of Db bone healing. Our results present a feasible strategy for precise treatment of molecular aberrations in stem and progenitor cell populations to correct skeletal manifestations of systemic disease.

    View details for DOI 10.1126/scitranslmed.aag2809

    View details for PubMedID 28077677

  • ChIA-PET2: a versatile and flexible pipeline for ChIA-PET data analysis. Nucleic acids research Li, G., Chen, Y., Snyder, M. P., Zhang, M. Q. 2017; 45 (1)

    Abstract

    ChIA-PET2 is a versatile and flexible pipeline for analyzing different types of ChIA-PET data from raw sequencing reads to chromatin loops. ChIA-PET2 integrates all steps required for ChIA-PET data analysis, including linker trimming, read alignment, duplicate removal, peak calling and chromatin loop calling. It supports different kinds of ChIA-PET data generated from different ChIA-PET protocols and also provides quality controls for different steps of ChIA-PET analysis. In addition, ChIA-PET2 can use phased genotype data to call allele-specific chromatin interactions. We applied ChIA-PET2 to different ChIA-PET datasets, demonstrating its significantly improved performance as well as its ability to easily process ChIA-PET raw data. ChIA-PET2 is available at https://github.com/GuipengLi/ChIA-PET2.

    View details for DOI 10.1093/nar/gkw809

    View details for PubMedID 27625391

    View details for PubMedCentralID PMC5224499

  • Identification of a novel mutation in APTX gene associated with Ataxia-oculomotor apraxia. Cold Spring Harbor molecular case studies Inlora, J. n., Sailani, M. R., Khodadadi, H. n., Teymurinezhad, A. n., Takahashi, S. n., Bernstein, J. A., Garshasbi, M. n., Snyder, M. P. 2017

    Abstract

    Hereditary ataxias are clinically and genetically heterogeneous family of disorders defined by the inability to control gait and muscle coordination. Given the non-specific symptoms of many hereditary ataxias, precise diagnosis relies on molecular genetic testing. To this end, we conducted whole exome sequencing (WES) on a large consanguineous Iranian family with hereditary ataxia and oculomotor apraxia. WES in five affected and six unaffected individuals resulted in the identification of a homozygous novel stop-gain mutation in APTX gene (c. 739T>A; p.Lys247Ter) that segregates with the phenotype. Mutations in APTX gene are associated with ataxia with oculomotor apraxia type 1 (AOA1).

    View details for PubMedID 28652255

  • Genome-Wide Temporal Profiling of Transcriptome and Open Chromatin of Early Cardiomyocyte Differentiation Derived From hiPSCs and hESCs. Circulation research Liu, Q. n., Jiang, C. n., Xu, J. n., Zhao, M. T., Van Bortle, K. n., Cheng, X. n., Wang, G. n., Chang, H. Y., Wu, J. C., Snyder, M. P. 2017; 121 (4): 376–91

    Abstract

    Recent advances have improved our ability to generate cardiomyocytes from human induced pluripotent stem cells (hiPSCs) and human embryonic stem cells (hESCs). However, our understanding of the transcriptional regulatory networks underlying early stages (ie, from mesoderm to cardiac mesoderm) of cardiomyocyte differentiation remains limited.To characterize transcriptome and chromatin accessibility during early cardiomyocyte differentiation from hiPSCs and hESCs.We profiled the temporal changes in transcriptome and chromatin accessibility at genome-wide levels during cardiomyocyte differentiation derived from 2 hiPSC lines and 2 hESC lines at 4 stages: pluripotent stem cells, mesoderm, cardiac mesoderm, and differentiated cardiomyocytes. Overall, RNA sequencing analysis revealed that transcriptomes during early cardiomyocyte differentiation were highly concordant between hiPSCs and hESCs, and clustering of 4 cell lines within each time point demonstrated that changes in genome-wide chromatin accessibility were similar across hiPSC and hESC cell lines. Weighted gene co-expression network analysis (WGCNA) identified several modules that were strongly correlated with different stages of cardiomyocyte differentiation. Several novel genes were identified with high weighted connectivity within modules and exhibited coexpression patterns with other genes, including noncoding RNA LINC01124 and uncharacterized RNA AK127400 in the module related to the mesoderm stage; E-box-binding homeobox 1 (ZEB1) in the module correlated with postcardiac mesoderm. We further demonstrated that ZEB1 is required for early cardiomyocyte differentiation. In addition, based on integrative analysis of both WGCNA and transcription factor motif enrichment analysis, we determined numerous transcription factors likely to play important roles at different stages during cardiomyocyte differentiation, such as T and eomesodermin (EOMES; mesoderm), lymphoid enhancer-binding factor 1 (LEF1) and mesoderm posterior BHLH transcription factor 1 (MESP1; from mesoderm to cardiac mesoderm), meis homeobox 1 (MEIS1) and GATA-binding protein 4 (GATA4) (postcardiac mesoderm), JUN and FOS families, and MEIS2 (cardiomyocyte).Both hiPSCs and hESCs share similar transcriptional regulatory mechanisms underlying early cardiac differentiation, and our results have revealed transcriptional regulatory networks and new factors (eg, ZEB1) controlling early stages of cardiomyocyte differentiation.

    View details for PubMedID 28663367

  • WISP3 mutation associated with Pseudorheumatoid Dysplasia. Cold Spring Harbor molecular case studies Sailani, M. R., Chappell, J. n., Inlora, J. n., Lynch, L. n., Narasimha, A. n., Mazroui, S. n., Zia, A. n., Bernstein, J. n., Aryani, O. n., Snyder, M. P. 2017

    Abstract

    Progressive pseudorheumatoid dysplasia (PPD) is a skeletal dysplasia characterized by predominant involvement of articular cartilage with progressive joint stiffness. Here we report genetic characterization of a consanguineous family segregating an uncharacterized from of skeletal dysplasia. Whole exome sequencing of four affected siblings and their parents identified a loss of function homozygous mutation in the WISP3 gene, leading to diagnosis of PPD in the affected individuals. The identified variant (chr6: 112382301; WISP3:c.156C>A p.Cys52*) is rare and predicted to cause premature termination of the WISP3 protein.

    View details for PubMedID 29092958

  • Topological organization and dynamic regulation of human tRNA genes during macrophage differentiation. Genome biology Van Bortle, K. n., Phanstiel, D. H., Snyder, M. P. 2017; 18 (1): 180

    Abstract

    The human genome is hierarchically organized into local and long-range structures that help shape cell-type-specific transcription patterns. Transfer RNA (tRNA) genes (tDNAs), which are transcribed by RNA polymerase III (RNAPIII) and encode RNA molecules responsible for translation, are dispersed throughout the genome and, in many cases, linearly organized into genomic clusters with other tDNAs. Whether the location and three-dimensional organization of tDNAs contribute to the activity of these genes has remained difficult to address, due in part to unique challenges related to tRNA sequencing. We therefore devised integrated tDNA expression profiling, a method that combines RNAPIII mapping with biotin-capture of nascent tRNAs. We apply this method to the study of dynamic tRNA gene regulation during macrophage development and further integrate these data with high-resolution maps of 3D chromatin structure.Integrated tDNA expression profiling reveals domain-level and loop-based organization of tRNA gene transcription during cellular differentiation. tRNA genes connected by DNA loops, which are proximal to CTCF binding sites and expressed at elevated levels compared to non-loop tDNAs, change coordinately with tDNAs and protein-coding genes at distal ends of interactions mapped by in situ Hi-C. We find that downregulated tRNA genes are specifically marked by enhanced promoter-proximal binding of MAF1, a transcriptional repressor of RNAPIII activity, altogether revealing multiple levels of tDNA regulation during cellular differentiation.We present evidence of both local and coordinated long-range regulation of human tDNA expression, suggesting the location and organization of tRNA genes contribute to dynamic tDNA activity during macrophage development.

    View details for PubMedID 28931413

    View details for PubMedCentralID PMC5607496

  • GATTACA: Lightweight Metagenomic Binning Using Kmer Counting Popic, V., Kuleshov, V., Snyder, M., Batzoglou, S., Sahinalp, S. C. SPRINGER-VERLAG BERLIN. 2017: 391–92
  • Cell Type-Specific Chromatin Signatures Underline Regulatory DNA Elements in Human Induced Pluripotent Stem Cells and Somatic Cells. Circulation research Zhao, M. T., Shao, N. Y., Hu, S. n., Ma, N. n., Srinivasan, R. n., Jahanbani, F. n., Lee, J. n., Zhang, S. L., Snyder, M. P., Wu, J. C. 2017; 121 (11): 1237–50

    Abstract

    Regulatory DNA elements in the human genome play important roles in determining the transcriptional abundance and spatiotemporal gene expression during embryonic heart development and somatic cell reprogramming. It is not well known how chromatin marks in regulatory DNA elements are modulated to establish cell type-specific gene expression in the human heart.We aimed to decipher the cell type-specific epigenetic signatures in regulatory DNA elements and how they modulate heart-specific gene expression.We profiled genome-wide transcriptional activity and a variety of epigenetic marks in the regulatory DNA elements using massive RNA-seq (n=12) and ChIP-seq (chromatin immunoprecipitation combined with high-throughput sequencing; n=84) in human endothelial cells (CD31+CD144+), cardiac progenitor cells (Sca-1+), fibroblasts (DDR2+), and their respective induced pluripotent stem cells. We uncovered 2 classes of regulatory DNA elements: class I was identified with ubiquitous enhancer (H3K4me1) and promoter (H3K4me3) marks in all cell types, whereas class II was enriched with H3K4me1 and H3K4me3 in a cell type-specific manner. Both class I and class II regulatory elements exhibited stimulatory roles in nearby gene expression in a given cell type. However, class I promoters displayed more dominant regulatory effects on transcriptional abundance regardless of distal enhancers. Transcription factor network analysis indicated that human induced pluripotent stem cells and somatic cells from the heart selected their preferential regulatory elements to maintain cell type-specific gene expression. In addition, we validated the function of these enhancer elements in transgenic mouse embryos and human cells and identified a few enhancers that could possibly regulate the cardiac-specific gene expression.Given that a large number of genetic variants associated with human diseases are located in regulatory DNA elements, our study provides valuable resources for deciphering the epigenetic modulation of regulatory DNA elements that fine-tune spatiotemporal gene expression in human cardiac development and diseases.

    View details for PubMedID 29030344

    View details for PubMedCentralID PMC5773062

  • Dynamic landscape and regulation of RNA editing in mammals. Nature Tan, M. H., Li, Q. n., Shanmugam, R. n., Piskol, R. n., Kohler, J. n., Young, A. N., Liu, K. I., Zhang, R. n., Ramaswami, G. n., Ariyoshi, K. n., Gupte, A. n., Keegan, L. P., George, C. X., Ramu, A. n., Huang, N. n., Pollina, E. A., Leeman, D. S., Rustighi, A. n., Goh, Y. P., Chawla, A. n., Del Sal, G. n., Peltz, G. n., Brunet, A. n., Conrad, D. F., Samuel, C. E., O'Connell, M. A., Walkley, C. R., Nishikura, K. n., Li, J. B. 2017; 550 (7675): 249–54

    Abstract

    Adenosine-to-inosine (A-to-I) RNA editing is a conserved post-transcriptional mechanism mediated by ADAR enzymes that diversifies the transcriptome by altering selected nucleotides in RNA molecules. Although many editing sites have recently been discovered, the extent to which most sites are edited and how the editing is regulated in different biological contexts are not fully understood. Here we report dynamic spatiotemporal patterns and new regulators of RNA editing, discovered through an extensive profiling of A-to-I RNA editing in 8,551 human samples (representing 53 body sites from 552 individuals) from the Genotype-Tissue Expression (GTEx) project and in hundreds of other primate and mouse samples. We show that editing levels in non-repetitive coding regions vary more between tissues than editing levels in repetitive regions. Globally, ADAR1 is the primary editor of repetitive sites and ADAR2 is the primary editor of non-repetitive coding sites, whereas the catalytically inactive ADAR3 predominantly acts as an inhibitor of editing. Cross-species analysis of RNA editing in several tissues revealed that species, rather than tissue type, is the primary determinant of editing levels, suggesting stronger cis-directed regulation of RNA editing for most sites, although the small set of conserved coding sites is under stronger trans-regulation. In addition, we curated an extensive set of ADAR1 and ADAR2 targets and showed that many editing sites display distinct tissue-specific regulation by the ADAR enzymes in vivo. Further analysis of the GTEx data revealed several potential regulators of editing, such as AIMP2, which reduces editing in muscles by enhancing the degradation of the ADAR proteins. Collectively, our work provides insights into the complex cis- and trans-regulation of A-to-I editing.

    View details for PubMedID 29022589

  • Landscape of X chromosome inactivation across human tissues. Nature Tukiainen, T. n., Villani, A. C., Yen, A. n., Rivas, M. A., Marshall, J. L., Satija, R. n., Aguirre, M. n., Gauthier, L. n., Fleharty, M. n., Kirby, A. n., Cummings, B. B., Castel, S. E., Karczewski, K. J., Aguet, F. n., Byrnes, A. n., Lappalainen, T. n., Regev, A. n., Ardlie, K. G., Hacohen, N. n., MacArthur, D. G. 2017; 550 (7675): 244–48

    Abstract

    X chromosome inactivation (XCI) silences transcription from one of the two X chromosomes in female mammalian cells to balance expression dosage between XX females and XY males. XCI is, however, incomplete in humans: up to one-third of X-chromosomal genes are expressed from both the active and inactive X chromosomes (Xa and Xi, respectively) in female cells, with the degree of 'escape' from inactivation varying between genes and individuals. The extent to which XCI is shared between cells and tissues remains poorly characterized, as does the degree to which incomplete XCI manifests as detectable sex differences in gene expression and phenotypic traits. Here we describe a systematic survey of XCI, integrating over 5,500 transcriptomes from 449 individuals spanning 29 tissues from GTEx (v6p release) and 940 single-cell transcriptomes, combined with genomic sequence data. We show that XCI at 683 X-chromosomal genes is generally uniform across human tissues, but identify examples of heterogeneity between tissues, individuals and cells. We show that incomplete XCI affects at least 23% of X-chromosomal genes, identify seven genes that escape XCI with support from multiple lines of evidence and demonstrate that escape from XCI results in sex biases in gene expression, establishing incomplete XCI as a mechanism that is likely to introduce phenotypic diversity. Overall, this updated catalogue of XCI across human tissues helps to increase our understanding of the extent and impact of the incompleteness in the maintenance of XCI.

    View details for PubMedID 29022598

  • Molecular and functional resemblance of differentiated cells derived from isogenic human iPSCs and SCNT-derived ESCs. Proceedings of the National Academy of Sciences of the United States of America Zhao, M. T., Chen, H. n., Liu, Q. n., Shao, N. Y., Sayed, N. n., Wo, H. T., Zhang, J. Z., Ong, S. G., Liu, C. n., Kim, Y. n., Yang, H. n., Chour, T. n., Ma, H. n., Gutierrez, N. M., Karakikes, I. n., Mitalipov, S. n., Snyder, M. P., Wu, J. C. 2017

    Abstract

    Patient-specific pluripotent stem cells (PSCs) can be generated via nuclear reprogramming by transcription factors (i.e., induced pluripotent stem cells, iPSCs) or by somatic cell nuclear transfer (SCNT). However, abnormalities and preclinical application of differentiated cells generated by different reprogramming mechanisms have yet to be evaluated. Here we investigated the molecular and functional features, and drug response of cardiomyocytes (PSC-CMs) and endothelial cells (PSC-ECs) derived from genetically relevant sets of human iPSCs, SCNT-derived embryonic stem cells (nt-ESCs), as well as in vitro fertilization embryo-derived ESCs (IVF-ESCs). We found that differentiated cells derived from isogenic iPSCs and nt-ESCs showed comparable lineage gene expression, cellular heterogeneity, physiological properties, and metabolic functions. Genome-wide transcriptome and DNA methylome analysis indicated that iPSC derivatives (iPSC-CMs and iPSC-ECs) were more similar to isogenic nt-ESC counterparts than those derived from IVF-ESCs. Although iPSCs and nt-ESCs shared the same nuclear DNA and yet carried different sources of mitochondrial DNA, CMs derived from iPSC and nt-ESCs could both recapitulate doxorubicin-induced cardiotoxicity and exhibited insignificant differences on reactive oxygen species generation in response to stress condition. We conclude that molecular and functional characteristics of differentiated cells from human PSCs are primarily attributed to the genetic compositions rather than the reprogramming mechanisms (SCNT vs. iPSCs). Therefore, human iPSCs can replace nt-ESCs as alternatives for generating patient-specific differentiated cells for disease modeling and preclinical drug testing.

    View details for PubMedID 29203658

  • Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. Nature genetics 2017; 49 (12): 1664–70

    Abstract

    Genetic variants have been associated with myriad molecular phenotypes that provide new insight into the range of mechanisms underlying genetic traits and diseases. Identifying any particular genetic variant's cascade of effects, from molecule to individual, requires assaying multiple layers of molecular complexity. We introduce the Enhancing GTEx (eGTEx) project that extends the GTEx project to combine gene expression with additional intermediate molecular measurements on the same tissues to provide a resource for studying how genetic differences cascade through molecular phenotypes to impact human health.

    View details for DOI 10.1038/ng.3969

    View details for PubMedID 29019975

  • The impact of rare variation on gene expression across tissues. Nature Li, X. n., Kim, Y. n., Tsang, E. K., Davis, J. R., Damani, F. N., Chiang, C. n., Hess, G. T., Zappala, Z. n., Strober, B. J., Scott, A. J., Li, A. n., Ganna, A. n., Bassik, M. C., Merker, J. D., Hall, I. M., Battle, A. n., Montgomery, S. B. 2017; 550 (7675): 239–43

    Abstract

    Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.

    View details for PubMedID 29022581

  • Genetic effects on gene expression across human tissues. Nature Battle, A. n., Brown, C. D., Engelhardt, B. E., Montgomery, S. B. 2017; 550 (7675): 204–13

    Abstract

    Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

    View details for PubMedID 29022597

  • Lineage-specific dynamic and pre-established enhancer-promoter contacts cooperate in terminal differentiation. Nature genetics Rubin, A. J., Barajas, B. C., Furlan-Magaril, M. n., Lopez-Pajares, V. n., Mumbach, M. R., Howard, I. n., Kim, D. S., Boxer, L. D., Cairns, J. n., Spivakov, M. n., Wingett, S. W., Shi, M. n., Zhao, Z. n., Greenleaf, W. J., Kundaje, A. n., Snyder, M. n., Chang, H. Y., Fraser, P. n., Khavari, P. A. 2017; 49 (10): 1522–28

    Abstract

    Chromosome conformation is an important feature of metazoan gene regulation; however, enhancer-promoter contact remodeling during cellular differentiation remains poorly understood. To address this, genome-wide promoter capture Hi-C (CHi-C) was performed during epidermal differentiation. Two classes of enhancer-promoter contacts associated with differentiation-induced genes were identified. The first class ('gained') increased in contact strength during differentiation in concert with enhancer acquisition of the H3K27ac activation mark. The second class ('stable') were pre-established in undifferentiated cells, with enhancers constitutively marked by H3K27ac. The stable class was associated with the canonical conformation regulator cohesin, whereas the gained class was not, implying distinct mechanisms of contact formation and regulation. Analysis of stable enhancers identified a new, essential role for a constitutively expressed, lineage-restricted ETS-family transcription factor, EHF, in epidermal differentiation. Furthermore, neither class of contacts was observed in pluripotent cells, suggesting that lineage-specific chromatin structure is established in tissue progenitor cells and is further remodeled in terminal differentiation.

    View details for PubMedID 28805829

  • Cloud-based Interactive Analytics for Terabytes of Genomic Variants Data Bioinformatics Pan, C., McInnes, G., Deflaux, N., Snyder, M. P., Bingham, J., Datta, S., Tsao, P. S. 2017: 3709–15

    Abstract

    Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired.We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information.Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs.cuiping@stanford.edu or ptsao@stanford.edu.Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btx468

    View details for PubMedCentralID PMC5860318

  • Disease Model of GATA4 Mutation Reveals Transcription Factor Cooperativity in Human Cardiogenesis CELL Ang, Y., Rivas, R. N., Ribeiro, A. J., Srivas, R., Rivera, J., Stone, N. R., Pratt, K., Mohamed, T. M., Fu, J., Spencer, C. I., Tippens, N. D., Li, M., Narasimha, A., Radzinsky, E., Moon-Grady, A. J., Yu, H., Pruitt, B. L., Snyder, M. P., Srivastava, D. 2016; 167 (7): 1734-?

    Abstract

    Mutation of highly conserved residues in transcription factors may affect protein-protein or protein-DNA interactions, leading to gene network dysregulation and human disease. Human mutations in GATA4, a cardiogenic transcription factor, cause cardiac septal defects and cardiomyopathy. Here, iPS-derived cardiomyocytes from subjects with a heterozygous GATA4-G296S missense mutation showed impaired contractility, calcium handling, and metabolic activity. In human cardiomyocytes, GATA4 broadly co-occupied cardiac enhancers with TBX5, another transcription factor that causes septal defects when mutated. The GATA4-G296S mutation disrupted TBX5 recruitment, particularly to cardiac super-enhancers, concomitant with dysregulation of genes related to the phenotypic abnormalities, including cardiac septation. Conversely, the GATA4-G296S mutation led to failure of GATA4 and TBX5-mediated repression at non-cardiac genes and enhanced open chromatin states at endothelial/endocardial promoters. These results reveal how disease-causing missense mutations can disrupt transcriptional cooperativity, leading to aberrant chromatin states and cellular dysfunction, including those related to morphogenetic defects.

    View details for DOI 10.1016/j.cell.2016.11.033

    View details for Web of Science ID 000393114700013

    View details for PubMedID 27984724

    View details for PubMedCentralID PMC5180611

  • Can heavy isotopes increase lifespan? Studies of relative abundance in various organisms reveal chemical perspectives on aging. BioEssays Li, X., Snyder, M. P. 2016; 38 (11): 1093-1101

    Abstract

    Stable heavy isotopes co-exist with their lighter counterparts in all elements commonly found in biology. These heavy isotopes represent a low natural abundance in isotopic composition but impose great retardation effects in chemical reactions because of kinetic isotopic effects (KIEs). Previous isotope analyses have recorded pervasive enrichment or depletion of heavy isotopes in various organisms, strongly supporting the capability of biological systems to distinguish different isotopes. This capability has recently been found to lead to general decline of heavy isotopes in metabolites during yeast aging. Conversely, supplementing heavy isotopes in growth medium promotes longevity. Whether this observation prevails in other organisms is not known, but it potentially bears promise in promoting human longevity.

    View details for DOI 10.1002/bies.201600040

    View details for PubMedID 27554342

    View details for PubMedCentralID PMC5108472

  • iPSC Model of Pulmonary Arterial Hypertension Reveals Novel Gene Expression and Patient Specificity. American journal of respiratory and critical care medicine Sa, S., Gu, M., Chappell, J., Shao, N., Ameen, M., Elliott, K. A., Li, D., Grubert, F., Li, C. G., Taylor, S., Cao, A., Ma, Y., Fong, R., Nguyen, L., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2016: -?

    Abstract

    Idiopathic or heritable pulmonary arterial hypertension is characterized by loss and obliteration of lung vasculature. Endothelial cell dysfunction is pivotal to the pathophysiology but different causal mechanisms may reflect a need for patient-tailored therapies.Endothelial cells differentiated from induced pluripotent stem cells were compared to pulmonary arterial endothelial cells from the same patients with idiopathic or heritable pulmonary arterial hypertension, to determine whether they shared functional abnormalities and altered gene expression patterns, that differed from those in unused donor cells. We then investigated whether endothelial cells differentiated from pluripotent cells could serve as surrogates to test emerging therapies.Functional changes assessed included adhesion, migration, tube formation, and propensity to apoptosis. Expression of BMPR2 and its target, collagen IV, pSMAD1/5 signaling and transcriptomic profiles were also analyzed.Native pulmonary arterial and induced pluripotent stem cell-derived endothelial cells from idiopathic and heritable pulmonary arterial hypertension patients compared to controls, showed a similar reduction in adhesion, migration, survival, and tube formation, decreased BMPR2 and downstream signaling and collagen IV expression. Transcriptomic profiling revealed high KISS1 related to reduced migration and low CES1, to impaired survival in patient cells. A beneficial angiogenic response to potential therapies, FK-506 and Elafin, was related to reduced SLIT3, an anti-migratory factor.Despite the site of disease in the lung our study indicates that induced pluripotent stem cell derived endothelial cells are useful surrogates to uncover novel features related to disease mechanisms and to better match patients to therapies.

    View details for PubMedID 27779452

  • Nat1 Deficiency Is Associated with Mitochondrial Dysfunction and Exercise Intolerance in Mice CELL REPORTS Chennamsetty, I., Coronado, M., Contrepois, K., Keller, M. P., Carcamo-Orive, I., Sandin, J., Fajardo, G., Whittle, A. J., Fathzadeh, M., Snyder, M., Reaven, G., Attie, A. D., Bernstein, D., Quertermous, T., Knowles, J. W. 2016; 17 (2): 527-540

    Abstract

    We recently identified human N-acetyltransferase 2 (NAT2) as an insulin resistance (IR) gene. Here, we examine the cellular mechanism linking NAT2 to IR and find that Nat1 (mouse ortholog of NAT2) is co-regulated with key mitochondrial genes. RNAi-mediated silencing of Nat1 led to mitochondrial dysfunction characterized by increased intracellular reactive oxygen species and mitochondrial fragmentation as well as decreased mitochondrial membrane potential, biogenesis, mass, cellular respiration, and ATP generation. These effects were consistent in 3T3-L1 adipocytes, C2C12 myoblasts, and in tissues from Nat1-deficient mice, including white adipose tissue, heart, and skeletal muscle. Nat1-deficient mice had changes in plasma metabolites and lipids consistent with a decreased ability to utilize fats for energy and a decrease in basal metabolic rate and exercise capacity without altered thermogenesis. Collectively, our results suggest that Nat1 deficiency results in mitochondrial dysfunction, which may constitute a mechanistic link between this gene and IR.

    View details for DOI 10.1016/j.celrep.2016.09.005

    View details for Web of Science ID 000385850700019

    View details for PubMedID 27705799

    View details for PubMedCentralID PMC5097870

  • Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nature genetics Corces, M. R., Buenrostro, J. D., Wu, B., Greenside, P. G., Chan, S. M., Koenig, J. L., Snyder, M. P., Pritchard, J. K., Kundaje, A., Greenleaf, W. J., Majeti, R., Chang, H. Y. 2016; 48 (10): 1193-1203

    Abstract

    We define the chromatin accessibility and transcriptional landscapes in 13 human primary blood cell types that span the hematopoietic hierarchy. Exploiting the finding that the enhancer landscape better reflects cell identity than mRNA levels, we enable 'enhancer cytometry' for enumeration of pure cell types from complex populations. We identify regulators governing hematopoietic differentiation and further show the lineage ontogeny of genetic elements linked to diverse human diseases. In acute myeloid leukemia (AML), chromatin accessibility uncovers unique regulatory evolution in cancer cells with a progressively increasing mutation burden. Single AML cells exhibit distinctive mixed regulome profiles corresponding to disparate developmental stages. A method to account for this regulatory heterogeneity identified cancer-specific deviations and implicated HOX factors as key regulators of preleukemic hematopoietic stem cell characteristics. Thus, regulome dynamics can provide diverse insights into hematopoietic development and disease.

    View details for DOI 10.1038/ng.3646

    View details for PubMedID 27526324

  • A proposal for validation of antibodies NATURE METHODS Uhlen, M., Bandrowski, A., Carr, S., Edwards, A., Ellenberg, J., Lundberg, E., Rimm, D. L., Rodriguez, H., Hiltke, T., Snyder, M., Yamamoto, T. 2016; 13 (10): 823-?

    View details for DOI 10.1038/NMETH.3995

    View details for Web of Science ID 000385194600015

    View details for PubMedID 27595404

  • Multiple Pairwise Analysis of Non-homologous Centromere Coupling Reveals Preferential Chromosome Size-Dependent Interactions and a Role for Bouquet Formation in Establishing the Interaction Pattern PLOS GENETICS Lefrancois, P., Rockmill, B., Xie, P., Roeder, G. S., Snyder, M. 2016; 12 (10)

    Abstract

    During meiosis, chromosomes undergo a homology search in order to locate their homolog to form stable pairs and exchange genetic material. Early in prophase, chromosomes associate in mostly non-homologous pairs, tethered only at their centromeres. This phenomenon, conserved through higher eukaryotes, is termed centromere coupling in budding yeast. Both initiation of recombination and the presence of homologs are dispensable for centromere coupling (occurring in spo11 mutants and haploids induced to undergo meiosis) but the presence of the synaptonemal complex (SC) protein Zip1 is required. The nature and mechanism of coupling have yet to be elucidated. Here we present the first pairwise analysis of centromere coupling in an effort to uncover underlying rules that may exist within these non-homologous interactions. We designed a novel chromosome conformation capture (3C)-based assay to detect all possible interactions between non-homologous yeast centromeres during early meiosis. Using this variant of 3C-qPCR, we found a size-dependent interaction pattern, in which chromosomes assort preferentially with chromosomes of similar sizes, in haploid and diploid spo11 cells, but not in a coupling-defective mutant (spo11 zip1 haploid and diploid yeast). This pattern is also observed in wild-type diploids early in meiosis but disappears as meiosis progresses and homologous chromosomes pair. We found no evidence to support the notion that ancestral centromere homology plays a role in pattern establishment in S. cerevisiae post-genome duplication. Moreover, we found a role for the meiotic bouquet in establishing the size dependence of centromere coupling, as abolishing bouquet (using the bouquet-defective spo11 ndj1 mutant) reduces it. Coupling in spo11 ndj1 rather follows telomere clustering preferences. We propose that a chromosome size preference for centromere coupling helps establish efficient homolog recognition.

    View details for DOI 10.1371/journal.pgen.1006347

    View details for Web of Science ID 000386683300016

    View details for PubMedID 27768699

    View details for PubMedCentralID PMC5074576

  • iPSC-derived cardiomyocytes reveal abnormal TGF-ß signalling in left ventricular non-compaction cardiomyopathy. Nature cell biology Kodo, K., Ong, S., Jahanbani, F., Termglinchan, V., Hirono, K., Inanloorahatloo, K., Ebert, A. D., Shukla, P., Abilez, O. J., Churko, J. M., Karakikes, I., Jung, G., Ichida, F., Wu, S. M., Snyder, M. P., Bernstein, D., Wu, J. C. 2016; 18 (10): 1031-1042

    Abstract

    Left ventricular non-compaction (LVNC) is the third most prevalent cardiomyopathy in children and its pathogenesis has been associated with the developmental defect of the embryonic myocardium. We show that patient-specific induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) generated from LVNC patients carrying a mutation in the cardiac transcription factor TBX20 recapitulate a key aspect of the pathological phenotype at the single-cell level and this was associated with perturbed transforming growth factor beta (TGF-β) signalling. LVNC iPSC-CMs have decreased proliferative capacity due to abnormal activation of TGF-β signalling. TBX20 regulates the expression of TGF-β signalling modifiers including one known to be a genetic cause of LVNC, PRDM16, and genome editing of PRDM16 caused proliferation defects in iPSC-CMs. Inhibition of TGF-β signalling and genome correction of the TBX20 mutation were sufficient to reverse the disease phenotype. Our study demonstrates that iPSC-CMs are a useful tool for the exploration of pathological mechanisms underlying poorly understood cardiomyopathies including LVNC.

    View details for DOI 10.1038/ncb3411

    View details for PubMedID 27642787

  • Full-Length Isoform Sequencing Reveals Novel Transcripts and Substantial Transcriptional Overlaps in a Herpesvirus PLOS ONE Tombacz, D., Csabai, Z., Olah, P., Balazs, Z., Liko, I., Zsigmond, L., Sharon, D., Snyder, M., Boldogkoi, Z. 2016; 11 (9)

    Abstract

    Whole transcriptome studies have become essential for understanding the complexity of genetic regulation. However, the conventionally applied short-read sequencing platforms cannot be used to reliably distinguish between many transcript isoforms. The Pacific Biosciences (PacBio) RS II platform is capable of reading long nucleic acid stretches in a single sequencing run. The pseudorabies virus (PRV) is an excellent system to study herpesvirus gene expression and potential interactions between the transcriptional units. In this work, non-amplified and amplified isoform sequencing protocols were used to characterize the poly(A+) fraction of the lytic transcriptome of PRV, with the aim of a complete transcriptional annotation of the viral genes. The analyses revealed a previously unrecognized complexity of the PRV transcriptome including the discovery of novel protein-coding and non-coding genes, novel mono- and polycistronic transcription units, as well as extensive transcriptional overlaps between neighboring and distal genes. This study identified non-coding transcripts overlapping all three replication origins of the PRV, which might play a role in the control of DNA synthesis. We additionally established the relative expression levels of gene products. Our investigations revealed that the whole PRV genome is utilized for transcription, including both DNA strands in all coding and intergenic regions. The genome-wide occurrence of transcript overlaps suggests a crosstalk between genes through a network formed by interacting transcriptional machineries with a potential function in the control of gene expression.

    View details for DOI 10.1371/journal.pone.0162868

    View details for Web of Science ID 000384328500015

    View details for PubMedID 27685795

    View details for PubMedCentralID PMC5042381

  • Transcriptome Profiling of Patient-Specific Human iPSC-Cardiomyocytes Predicts Individual Drug Safety and Efficacy Responses In Vitro. Cell stem cell Matsa, E., Burridge, P. W., Yu, K., Ahrens, J. H., Termglinchan, V., Wu, H., Liu, C., Shukla, P., Sayed, N., Churko, J. M., Shao, N., Woo, N. A., Chao, A. S., Gold, J. D., Karakikes, I., Snyder, M. P., Wu, J. C. 2016; 19 (3): 311-325

    Abstract

    Understanding individual susceptibility to drug-induced cardiotoxicity is key to improving patient safety and preventing drug attrition. Human induced pluripotent stem cells (hiPSCs) enable the study of pharmacological and toxicological responses in patient-specific cardiomyocytes (CMs) and may serve as preclinical platforms for precision medicine. Transcriptome profiling in hiPSC-CMs from seven individuals lacking known cardiovascular disease-associated mutations and in three isogenic human heart tissue and hiPSC-CM pairs showed greater inter-patient variation than intra-patient variation, verifying that reprogramming and differentiation preserve patient-specific gene expression, particularly in metabolic and stress-response genes. Transcriptome-based toxicology analysis predicted and risk-stratified patient-specific susceptibility to cardiotoxicity, and functional assays in hiPSC-CMs using tacrolimus and rosiglitazone, drugs targeting pathways predicted to produce cardiotoxicity, validated inter-patient differential responses. CRISPR/Cas9-mediated pathway correction prevented drug-induced cardiotoxicity. Our data suggest that hiPSC-CMs can be used in vitro to predict and validate patient-specific drug safety and efficacy, potentially enabling future clinical approaches to precision medicine.

    View details for DOI 10.1016/j.stem.2016.07.006

    View details for PubMedID 27545504

  • Predicting Ovarian Cancer Patients' Clinical Response to Platinum-Based Chemotherapy by Their Tumor Proteomic Signatures JOURNAL OF PROTEOME RESEARCH Yu, K., Levine, D. A., Zhang, H., Chan, D. W., Zhang, Z., Snyder, M. 2016; 15 (8): 2455-2465

    Abstract

    Ovarian cancer is the deadliest gynecologic malignancy in the United States with most patients diagnosed in the advanced stage of the disease. Platinum-based antineoplastic therapeutics is indispensable to treating advanced ovarian serous carcinoma. However, patients have heterogeneous responses to platinum drugs, and it is difficult to predict these interindividual differences before administering medication. In this study, we investigated the tumor proteomic profiles and clinical characteristics of 130 ovarian serous carcinoma patients analyzed by the Clinical Proteomic Tumor Analysis Consortium (CPTAC), predicted the platinum drug response using supervised machine learning methods, and evaluated our prediction models through leave-one-out cross-validation. Our data-driven feature selection approach indicated that tumor proteomics profiles contain information for predicting binarized platinum response (P < 0.0001). We further built a least absolute shrinkage and selection operator (LASSO)-Cox proportional hazards model that stratified patients into early relapse and late relapse groups (P = 0.00013). The top proteomic features indicative of platinum response were involved in ATP synthesis pathways and Ran GTPase binding. Overall, we demonstrated that proteomic profiles of ovarian serous carcinoma patients predicted platinum drug responses as well as provided insights into the biological processes influencing the efficacy of platinum-based therapeutics. Our analytical approach is also extensible to predicting response to other antineoplastic agents or treatment modalities for both ovarian and other cancers.

    View details for DOI 10.1021/acs.jproteome.5b01129

    View details for Web of Science ID 000381235900010

    View details for PubMedID 27312948

  • EPHB4 kinase-inactivating mutations cause autosomal dominant lymphatic-related hydrops fetalis. journal of clinical investigation Martin-Almedina, S., Martinez-Corral, I., Holdhus, R., Vicente, A., Fotiou, E., Lin, S., Petersen, K., Simpson, M. A., Hoischen, A., Gilissen, C., Jeffery, H., Atton, G., Karapouliou, C., Brice, G., Gordon, K., Wiseman, J. W., Wedin, M., Rockson, S. G., Jeffery, S., Mortimer, P. S., Snyder, M. P., Berland, S., Mansour, S., Makinen, T., Ostergaard, P. 2016; 126 (8): 3080-3088

    Abstract

    Hydrops fetalis describes fluid accumulation in at least 2 fetal compartments, including abdominal cavities, pleura, and pericardium, or in body tissue. The majority of hydrops fetalis cases are nonimmune conditions that present with generalized edema of the fetus, and approximately 15% of these nonimmune cases result from a lymphatic abnormality. Here, we have identified an autosomal dominant, inherited form of lymphatic-related (nonimmune) hydrops fetalis (LRHF). Independent exome sequencing projects on 2 families with a history of in utero and neonatal deaths associated with nonimmune hydrops fetalis uncovered 2 heterozygous missense variants in the gene encoding Eph receptor B4 (EPHB4). Biochemical analysis determined that the mutant EPHB4 proteins are devoid of tyrosine kinase activity, indicating that loss of EPHB4 signaling contributes to LRHF pathogenesis. Further, inactivation of Ephb4 in lymphatic endothelial cells of developing mouse embryos led to defective lymphovenous valve formation and consequent subcutaneous edema. Together, these findings identify EPHB4 as a critical regulator of early lymphatic vascular development and demonstrate that mutations in the gene can cause an autosomal dominant form of LRHF that is associated with a high mortality rate.

    View details for DOI 10.1172/JCI85794

    View details for PubMedID 27400125

    View details for PubMedCentralID PMC4966301

  • Omics Profiling in Precision Oncology. Molecular & cellular proteomics Yu, K., Snyder, M. 2016; 15 (8): 2525-2536

    Abstract

    Cancer causes significant morbidity and mortality worldwide, and is the area most targeted in precision medicine. Recent development of high-throughput methods enables detailed omics analysis of the molecular mechanisms underpinning tumor biology. These studies have identified clinically actionable mutations, gene and protein expression patterns associated with prognosis, and provided further insights into the molecular mechanisms indicative of cancer biology and new therapeutics strategies such as immunotherapy. In this review, we summarize the techniques used for tumor omics analysis, recapitulate the key findings in cancer omics studies, and point to areas requiring further research on precision oncology.

    View details for DOI 10.1074/mcp.O116.059253

    View details for PubMedID 27099341

  • Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer. Cell Zhang, H., Liu, T., Zhang, Z., Payne, S. H., Zhang, B., McDermott, J. E., Zhou, J., Petyuk, V. A., Chen, L., Ray, D., Sun, S., Yang, F., Chen, L., Wang, J., Shah, P., Cha, S. W., Aiyetan, P., Woo, S., Tian, Y., Gritsenko, M. A., Clauss, T. R., Choi, C., Monroe, M. E., Thomas, S., Nie, S., Wu, C., Moore, R. J., Yu, K., Tabb, D. L., Fenyö, D., Bafna, V., Wang, Y., Rodriguez, H., Boja, E. S., Hiltke, T., Rivers, R. C., Sokoll, L., Zhu, H., Shih, I., Cope, L., Pandey, A., Zhang, B., Snyder, M. P., Levine, D. A., Smith, R. D., Chan, D. W., Rodland, K. D. 2016; 166 (3): 755-765

    Abstract

    To provide a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer, we performed a comprehensive mass-spectrometry-based proteomic characterization of 174 ovarian tumors previously analyzed by The Cancer Genome Atlas (TCGA), of which 169 were high-grade serous carcinomas (HGSCs). Integrating our proteomic measurements with the genomic data yielded a number of insights into disease, such as how different copy-number alternations influence the proteome, the proteins associated with chromosomal instability, the sets of signaling pathways that diverse genome rearrangements converge on, and the ones most associated with short overall survival. Specific protein acetylations associated with homologous recombination deficiency suggest a potential means for stratifying patients for therapy. In addition to providing a valuable resource, these findings provide a view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC.

    View details for DOI 10.1016/j.cell.2016.05.069

    View details for PubMedID 27372738

  • Integrated Network Analysis Reveals an Association between Plasma Mannose Levels and Insulin Resistance CELL METABOLISM Lee, S., Zhang, C., Kilicarslan, M., Piening, B. D., Bjornson, E., Hallstrom, B. M., Groen, A. K., Ferrannini, E., Laakso, M., Snyder, M., Bluher, M., Uhlen, M., Nielsen, J., Smith, U., Serlie, M. J., Boren, J., Mardinoglu, A. 2016; 24 (1): 172-184

    Abstract

    To investigate the biological processes that are altered in obese subjects, we generated cell-specific integrated networks (INs) by merging genome-scale metabolic, transcriptional regulatory and protein-protein interaction networks. We performed genome-wide transcriptomics analysis to determine the global gene expression changes in the liver and three adipose tissues from obese subjects undergoing bariatric surgery and integrated these data into the cell-specific INs. We found dysregulations in mannose metabolism in obese subjects and validated our predictions by detecting mannose levels in the plasma of the lean and obese subjects. We observed significant correlations between plasma mannose levels, BMI, and insulin resistance (IR). We also measured plasma mannose levels of the subjects in two additional different cohorts and observed that an increased plasma mannose level was associated with IR and insulin secretion. We finally identified mannose as one of the best plasma metabolites in explaining the variance in obesity-independent IR.

    View details for DOI 10.1016/j.cmet.2016.05.026

    View details for PubMedID 27345421

  • Using Mass Spectrometry to Quantify Rituximab and Perform Individualized Immunoglobulin Phenotyping in ANCA-Associated Vasculitis ANALYTICAL CHEMISTRY Mills, J. R., Cornec, D., Dasari, S., Ladwig, P. M., Hummel, A. M., Cheu, M., Murray, D. L., Willrich, M. A., Snyder, M. R., Hoffman, G. S., Kallenberg, C. G., Langford, C. A., Merkel, P. A., Monach, P. A., Seo, P., Spiera, R. F., St Cair, E. W., Stone, J. H., Specks, U., Barnidge, D. R. 2016; 88 (12): 6317-6325

    Abstract

    Therapeutic monoclonal immunoglobulins (mAbs) are used to treat patients with a wide range of disorders including autoimmune diseases. As pharmaceutical companies bring more fully humanized therapeutic mAb drugs to the healthcare market analytical platforms that perform therapeutic drug monitoring (TDM) without relying on mAb specific reagents will be needed. In this study we demonstrate that liquid-chromatography-mass spectrometry (LC-MS) can be used to perform TDM of mAbs in the same manner as smaller nonbiologic drugs. The assay uses commercially available reagents combined with heavy and light chain disulfide bond reduction followed by light chain analysis by microflow-LC-electrospray ionization-quadrupole-time-of-flight mass spectrometry (ESI-Q-TOF MS). Quantification is performed using the peak areas from multiply charged mAb light chain ions using an in-house developed software package developed for TDM of mAbs. The data presented here demonstrate the ability of an LC-MS assay to quantify a therapeutic mAb in a large cohort of patients in a clinical trial. The ability to quantify any mAb in serum via the reduced light chain without the need for reagents specific for each mAb demonstrates the unique capabilities of LC-MS. This fact, coupled with the ability to phenotype a patient's polyclonal repertoire in the same analysis further shows the potential of this approach to mAb analysis.

    View details for DOI 10.1021/acs.analchem.6b00544

    View details for Web of Science ID 000378470200034

    View details for PubMedID 27228216

  • Genome assembly from synthetic long read clouds BIOINFORMATICS Kuleshov, V., Snyder, M. P., Batzoglou, S. 2016; 32 (12): 216-224

    Abstract

    Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads.Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR's underlying short reads, which we refer to as read clouds This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads.Our source code is freely available at https://github.com/kuleshov/architectkuleshov@stanford.edu.

    View details for DOI 10.1093/bioinformatics/btw267

    View details for Web of Science ID 000379734300025

    View details for PubMedCentralID PMC4908351

  • Genome assembly from synthetic long read clouds. Bioinformatics Kuleshov, V., Snyder, M. P., Batzoglou, S. 2016; 32 (12): i216-i224

    Abstract

    Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads.Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR's underlying short reads, which we refer to as read clouds This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads.Our source code is freely available at https://github.com/kuleshov/architectkuleshov@stanford.edu.

    View details for DOI 10.1093/bioinformatics/btw267

    View details for PubMedID 27307620

  • Effects of cellular origin on differentiation of human induced pluripotent stem cell-derived endothelial cells. JCI insight Hu, S., Zhao, M., Jahanbani, F., Shao, N., Lee, W. H., Chen, H., Snyder, M. P., Wu, J. C. 2016; 1 (8)

    Abstract

    Human induced pluripotent stem cells (iPSCs) can be derived from various types of somatic cells by transient overexpression of 4 Yamanaka factors (OCT4, SOX2, C-MYC, and KLF4). Patient-specific iPSC derivatives (e.g., neuronal, cardiac, hepatic, muscular, and endothelial cells [ECs]) hold great promise in drug discovery and regenerative medicine. In this study, we aimed to evaluate whether the cellular origin can affect the differentiation, in vivo behavior, and single-cell gene expression signatures of human iPSC-derived ECs. We derived human iPSCs from 3 types of somatic cells of the same individuals: fibroblasts (FB-iPSCs), ECs (EC-iPSCs), and cardiac progenitor cells (CPC-iPSCs). We then differentiated them into ECs by sequential administration of Activin, BMP4, bFGF, and VEGF. EC-iPSCs at early passage (10 < P < 20) showed higher EC differentiation propensity and gene expression of EC-specific markers (PECAM1 and NOS3) than FB-iPSCs and CPC-iPSCs. In vivo transplanted EC-iPSC-ECs were recovered with a higher percentage of CD31(+) population and expressed higher EC-specific gene expression markers (PECAM1, KDR, and ICAM) as revealed by microfluidic single-cell quantitative PCR (qPCR). In vitro EC-iPSC-ECs maintained a higher CD31(+) population than FB-iPSC-ECs and CPC-iPSC-ECs with long-term culturing and passaging. These results indicate that cellular origin may influence lineage differentiation propensity of human iPSCs; hence, the somatic memory carried by early passage iPSCs should be carefully considered before clinical translation.

    View details for PubMedID 27398408

  • Effects of cellular origin on differentiation of human induced pluripotent stem cell-derived endothelial cells JCI INSIGHT Hu, S., Zhao, M., Jahanbani, F., Shao, N., Lee, W., Chen, H., Snyder, M. P., Wu, J. C. 2016; 1 (8)
  • The genetic predisposition to bronchopulmonary dysplasia CURRENT OPINION IN PEDIATRICS Yu, K., Li, J., Snyder, M., Shaw, G. M., O'Brodovich, H. M. 2016; 28 (3): 318-323

    Abstract

    Bronchopulmonary dysplasia (BPD) is a prevalent chronic lung disease in premature infants. Twin studies have shown strong heritability underlying this disease; however, the genetic architecture of BPD remains unclear.A number of studies employed different approaches to characterize the genetic aberrations associated with BPD, including candidate gene studies, genome-wide association studies, exome sequencing, integrative omics analysis, and pathway analysis. Candidate gene studies identified a number of genes potentially involved with the development of BPD, but the etiological contribution from each gene is not substantial. Copy number variation studies and three independent genome-wide association studies did not identify genetic variations significantly and consistently associated with BPD. A recent exome-sequencing study pointed to rare variants implicated in the disease. In this review, we summarize these studies' methodology and findings, and suggest future research directions to better understand the genetic underpinnings of this potentially life-long lung disease.Genetic factors play a significant role in the development of BPD. Recent studies suggested that rare variants in genes participating in lung development pathways could contribute to BPD susceptibility.

    View details for DOI 10.1097/MOP.0000000000000344

    View details for Web of Science ID 000376387000010

    View details for PubMedID 26963946

    View details for PubMedCentralID PMC4853271

  • Concerted genomic targeting of H3K27 demethylase REF6 and chromatin-remodeling ATPase BRM in Arabidopsis NATURE GENETICS Li, C., Gu, L., Gao, L., Chen, C., Wei, C., Qiu, Q., Chien, C., Wang, S., Jiang, L., Ai, L., Chen, C., Yang, S., Nguyen, V., Qi, Y., Snyder, M. P., Burlingame, A. L., Kohalmi, S. E., Huang, S., Cao, X., Wang, Z., Wu, K., Chen, X., Cui, Y. 2016; 48 (6): 687-?

    Abstract

    SWI/SNF-type chromatin remodelers, such as BRAHMA (BRM), and H3K27 demethylases both have active roles in regulating gene expression at the chromatin level, but how they are recruited to specific genomic sites remains largely unknown. Here we show that RELATIVE OF EARLY FLOWERING 6 (REF6), a plant-unique H3K27 demethylase, targets genomic loci containing a CTCTGYTY motif via its zinc-finger (ZnF) domains and facilitates the recruitment of BRM. Genome-wide analyses showed that REF6 colocalizes with BRM at many genomic sites with the CTCTGYTY motif. Loss of REF6 results in decreased BRM occupancy at BRM-REF6 co-targets. Furthermore, REF6 directly binds to the CTCTGYTY motif in vitro, and deletion of the motif from a target gene renders it inaccessible to REF6 in vivo. Finally, we show that, when its ZnF domains are deleted, REF6 loses its genomic targeting ability. Thus, our work identifies a new genomic targeting mechanism for an H3K27 demethylase and demonstrates its key role in recruiting the BRM chromatin remodeler.

    View details for DOI 10.1038/ng.3555

    View details for PubMedID 27111034

  • Age-Dependent Pancreatic Gene Regulation Reveals Mechanisms Governing Human beta Cell Function CELL METABOLISM Arda, H. E., Li, L., Tsai, J., Torre, E. A., Rosli, Y., Peiris, H., Spitale, R. C., Dai, C., Gu, X., Qu, K., Wang, P., Wang, J., Grompe, M., Scharfmann, R., Snyder, M. S., Bottino, R., Powers, A. C., Chang, H. Y., Kim, S. K. 2016; 23 (5): 909-920

    Abstract

    Intensive efforts are focused on identifying regulators of human pancreatic islet cell growth and maturation to accelerate development of therapies for diabetes. After birth, islet cell growth and function are dynamically regulated; however, establishing these age-dependent changes in humans has been challenging. Here, we describe a multimodal strategy for isolating pancreatic endocrine and exocrine cells from children and adults to identify age-dependent gene expression and chromatin changes on a genomic scale. These profiles revealed distinct proliferative and functional states of islet α cells or β cells and histone modifications underlying age-dependent gene expression changes. Expression of SIX2 and SIX3, transcription factors without prior known functions in the pancreas and linked to fasting hyperglycemia risk, increased with age specifically in human islet β cells. SIX2 and SIX3 were sufficient to enhance insulin content or secretion in immature β cells. Our work provides a unique resource to study human-specific regulators of islet cell maturation and function.

    View details for DOI 10.1016/j.cmet.2016.04.002

    View details for PubMedID 27133132

  • Can Metabolic Profiles Be Used as a Phenotypic Readout of the Genome to Enhance Precision Medicine? CLINICAL CHEMISTRY Contrepois, K., Liang, L., Snyder, M. 2016; 62 (5): 676–78

    View details for PubMedID 26960666

    View details for PubMedCentralID PMC4851585

  • Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection BMC BIOINFORMATICS Zhang, Q., Zeng, X., Younkin, S., Kawli, T., Snyder, M. P., Keles, S. 2016; 17

    Abstract

    Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection.We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection.Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.

    View details for DOI 10.1186/s12859-016-0957-1

    View details for Web of Science ID 000370775000001

    View details for PubMedID 26908256

    View details for PubMedCentralID PMC4765064

  • Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations. Nature genetics Araya, C. L., Cenik, C., Reuter, J. A., Kiss, G., Pande, V. S., Snyder, M. P., Greenleaf, W. J. 2016; 48 (2): 117-125

    Abstract

    Cancer sequencing studies have primarily identified cancer driver genes by the accumulation of protein-altering mutations. An improved method would be annotation independent, sensitive to unknown distributions of functions within proteins and inclusive of noncoding drivers. We employed density-based clustering methods in 21 tumor types to detect variably sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and noncoding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼15% of specific tumor types. SMRs demonstrate spatial clustering of alterations in molecular domains and at interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated across tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest that mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally agnostic driver identification.

    View details for DOI 10.1038/ng.3471

    View details for PubMedID 26691984

  • Protein substrates of the arginine methyltransferase Hmt1 identified by proteome arrays PROTEOMICS Low, J. K., Im, H., Erce, M. A., Hart-Smith, G., Snyder, M. P., Wilkins, M. R. 2016; 16 (3): 465–76

    Abstract

    Arginine methylation on nonhistone proteins is associated with a number of cellular processes including RNA splicing, protein localization, and the formation of protein complexes. In this manuscript, Saccharomyces cerevisiae proteome arrays carrying 4228 proteins were used with an antimethylarginine antibody to first identify 88 putatively arginine-methylated proteins. By treating the arrays with recombinant arginine methyltransferase Hmt1, 42 proteins were found to be possible substrates of this enzyme. Analysis of the putative arginine-methylated proteins revealed that they were predominantly nuclear or nucleolar in localization, consistent with the localization of Hmt1. Many are involved in known methylarginine-associated functions, such as RNA processing and ribonucleoprotein complex biogenesis, yet others are of newer classes, namely RNA/DNA helicases and tRNA-associated proteins. Using ex vivo methylation and MS/MS, a set of 12 proteins (Brr1, Dia4, Hts1, Mpp10, Mrd1, Nug1, Prp43, Rpa43, Rrp43, Spp381, Utp4, and Npl3), including the RNA helicase Prp43 and tRNA ligases Dia4 and Hts1, were all validated as Hmt1 substrates. Interestingly, the majority of these also had human orthologs, or family members, that have been documented elsewhere to carry arginine methylation. These results confirm arginine methylation as a widespread modification and Hmt1 as the major arginine methyltransferase in the S. cerevisiae cell.

    View details for PubMedID 26572822

  • Effects of Formalin Fixation Variables on DNA Integrity for Genomic Applications in Cancer Lefterova, M., Clark, M. J., Alla, R. K., Luo, S., Morra, M., Helman, E., Boyle, S. M., Kirk, S., Sripakdeevong, P., Karbelashvili, M., Church, D. M., Snyder, M. P., West, J., Chen, R. NATURE PUBLISHING GROUP. 2016: 516A–517A
  • Proteome-wide survey of the autoimmune target repertoire in autoimmune polyendocrine syndrome type 1 SCIENTIFIC REPORTS Landegren, N., Sharon, D., Freyhult, E., Hallgren, A., Eriksson, D., Edqvist, P., Bensing, S., Wahlberg, J., Nelson, L. M., Gustafsson, J., Husebye, E. S., Anderson, M. S., Snyder, M., Kampe, O. 2016; 6

    Abstract

    Autoimmune polyendocrine syndrome type 1 (APS1) is a monogenic disorder that features multiple autoimmune disease manifestations. It is caused by mutations in the Autoimmune regulator (AIRE) gene, which promote thymic display of thousands of peripheral tissue antigens in a process critical for establishing central immune tolerance. We here used proteome arrays to perform a comprehensive study of autoimmune targets in APS1. Interrogation of established autoantigens revealed highly reliable detection of autoantibodies, and by exploring the full panel of more than 9000 proteins we further identified MAGEB2 and PDILT as novel major autoantigens in APS1. Our proteome-wide assessment revealed a marked enrichment for tissue-specific immune targets, mirroring AIRE's selectiveness for this category of genes. Our findings also suggest that only a very limited portion of the proteome becomes targeted by the immune system in APS1, which contrasts the broad defect of thymic presentation associated with AIRE-deficiency and raises novel questions what other factors are needed for break of tolerance.

    View details for DOI 10.1038/srep20104

    View details for PubMedID 26830021

  • Distance from sub-Saharan Africa predicts mutational load in diverse human genomes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Henn, B. M., Botigue, L. R., Peischl, S., Dupanloup, I., Lipatov, M., Maples, B. K., Martin, A. R., Musharoff, S., Cann, H., Snyder, M. P., Excoffier, L., Kidd, J. M., Bustamante, C. D. 2016; 113 (4): E440-E449

    Abstract

    The Out-of-Africa (OOA) dispersal ∼ 50,000 y ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.

    View details for DOI 10.1073/pnas.1510805112

    View details for Web of Science ID 000368617900008

    View details for PubMedCentralID PMC4743782

  • Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proceedings of the National Academy of Sciences of the United States of America Henn, B. M., Botigué, L. R., Peischl, S., Dupanloup, I., Lipatov, M., Maples, B. K., Martin, A. R., Musharoff, S., Cann, H., Snyder, M. P., Excoffier, L., Kidd, J. M., Bustamante, C. D. 2016; 113 (4): E440-9

    Abstract

    The Out-of-Africa (OOA) dispersal ∼ 50,000 y ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.

    View details for DOI 10.1073/pnas.1510805112

    View details for PubMedID 26712023

    View details for PubMedCentralID PMC4743782

  • Disease Variant Landscape of a Large Multiethnic Population of Moyamoya Patients by Exome Sequencing G3-GENES GENOMES GENETICS Shoemaker, L. D., Clark, M. J., Patwardhan, A., Chandratillake, G., Garcia, S., Chen, R., Morgan, A. A., Leng, N., Kirk, S., Chen, R., Cook, D. J., Snyder, M., Steinberg, G. K. 2016; 6 (1): 41-49

    Abstract

    Moyamoya disease (MMD) is a rare disorder characterized by cerebrovascular occlusion and development of hemorrhage-prone collateral vessels. Approximately 10-12% of cases are familial, with a presumed low penetrance autosomal dominant pattern of inheritance. Diagnosis commonly occurs only after clinical presentation. The recent identification of the RNF213 founder mutation (p.R4810K) in the Asian population has made a significant contribution, but the etiology of this disease remains unclear. To further develop the variant landscape of MMD, we performed high-depth whole exome sequencing of 125 unrelated, predominantly nonfamilial, ethnically diverse MMD patients in parallel with 125 internally sequenced, matched controls using the same exome and analysis platform. Three subpopulations were established: Asian, Caucasian, and non-RNF213 founder mutation cases. We provided additional support for the previously observed RNF213 founder mutation (p.R4810K) in Asian cases (P = 6.01×10(-5)) that was enriched among East Asians compared to Southeast Asian and Pacific Islander cases (P = 9.52×10(-4)) and was absent in all Caucasian cases. The most enriched variant in Caucasian (P = 7.93×10(-4)) and non-RNF213 founder mutation (P = 1.51×10(-3)) cases was ZXDC (p.P562L), a gene involved in MHC Class II activation. Collapsing variant methodology ranked OBSCN, a gene involved in myofibrillogenesis, as most enriched in Caucasian (P = 1.07×10(-4)) and non-RNF213 founder mutation cases (P = 5.31×10(-5)). These findings further support the East Asian origins of the RNF213 (p.R4810K) variant and more fully describe the genetic landscape of multiethnic MMD, revealing novel, alternative candidate variants and genes that may be important in MMD etiology and diagnosis.

    View details for DOI 10.1534/g3.115.020321

    View details for Web of Science ID 000367725000004

    View details for PubMedCentralID PMC4704723

  • Secure cloud computing for genomic data Nature Biotechnology Somalee, D., Keith, B., Michael, S. 2016; 34 (6): 588-91

    View details for DOI 10.1038/nbt.3496

  • Yeast longevity promoted by reversing aging-associated decline in heavy isotope content npj Aging and Mechanisms of Disease Li, X., Snyder, M. P. 2016; 2 (16004): 16004

    Abstract

    Dysregulation of metabolism develops with organismal aging. Both genetic and environmental manipulations promote longevity by effectively diverting various metabolic processes against aging. How these processes converge on the metabolome is not clear. Here we report that the heavy isotopic forms of common elements, a universal feature of metabolites, decline in yeast cells undergoing chronological aging. Supplementation of deuterium, a heavy hydrogen isotope, through heavy water (D2O) uptake extends yeast chronological lifespan (CLS) by up to 85% with minimal effects on growth. The CLS extension by D2O bypasses several known genetic regulators, but is abrogated by calorie restriction and mitochondrial deficiency. Heavy water substantially suppresses endogenous generation of reactive oxygen species (ROS) and slows the pace of metabolic consumption and disposal. Protection from aging by heavy isotopes might result from kinetic modulation of biochemical reactions. Altogether, our findings reveal a novel perspective of aging and new means for promoting longevity.

    View details for DOI 10.1038/npjamd.2016.4

    View details for PubMedCentralID PMC5515009

  • HARNESSING BIG DATA FOR PRECISION MEDICINE: INFRASTRUCTURES AND APPLICATIONS. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Yu, K., Hart, S. N., Goldfeder, R., Zhang, Q. C., Parker, S. C., Snyder, M. 2016; 22: 635-639

    Abstract

    Precision medicine is a health management approach that accounts for individual differences in genetic backgrounds and environmental exposures. With the recent advancements in high-throughput omics profiling technologies, collections of large study cohorts, and the developments of data mining algorithms, big data in biomedicine is expected to provide novel insights into health and disease states, which can be translated into personalized disease prevention and treatment plans. However, petabytes of biomedical data generated by multiple measurement modalities poses a significant challenge for data analysis, integration, storage, and result interpretation. In addition, patient privacy preservation, coordination between participating medical centers and data analysis working groups, as well as discrepancies in data sharing policies remain important topics of discussion. In this workshop, we invite experts in omics integration, biobank research, and data management to share their perspectives on leveraging big data to enable precision medicine.Workshop website: http://tinyurl.com/PSB17BigData; HashTag: #PSB17BigData.

    View details for PubMedID 27897013

  • NIH working group report-using genomic information to guide weight management: From universal to precision treatment OBESITY Bray, M. S., Loos, R. J., McCaffery, J. M., Ling, C., Franks, P. W., Weinstock, G. M., Snyder, M. P., Vassy, J. L., Agurs-Collins, T. 2016; 24 (1): 14-22

    Abstract

    Precision medicine utilizes genomic and other data to optimize and personalize treatment. Although more than 2,500 genetic tests are currently available, largely for extreme and/or rare phenotypes, the question remains whether this approach can be used for the treatment of common, complex conditions like obesity, inflammation, and insulin resistance, which underlie a host of metabolic diseases.This review, developed from a Trans-NIH Conference titled "Genes, Behaviors, and Response to Weight Loss Interventions," provides an overview of the state of genetic and genomic research in the area of weight change and identifies key areas for future research.Although many loci have been identified that are associated with cross-sectional measures of obesity/body size, relatively little is known regarding the genes/loci that influence dynamic measures of weight change over time. Although successful short-term weight loss has been achieved using many different strategies, sustainable weight loss has proven elusive for many, and there are important gaps in our understanding of energy balance regulation.Elucidating the molecular basis of variability in weight change has the potential to improve treatment outcomes and inform innovative approaches that can simultaneously take into account information from genomic and other sources in devising individualized treatment plans.

    View details for DOI 10.1002/oby.21381

    View details for PubMedID 26692578

  • Metformin Improves Diabetic Bone Health by Re-Balancing Catabolism and Nitrogen Disposal PLOS ONE Li, X., Guo, Y., Yan, W., Snyder, M. P., Li, X. 2015; 10 (12)

    Abstract

    Metformin, a leading drug used to treat diabetic patients, is reported to benefit bone homeostasis under hyperglycemia in animal models. However, both the molecular targets and the biological pathways affected by metformin in bone are not well identified or characterized. The objective of this study is to investigate the bioengergeric pathways affected by metformin in bone marrow cells of mice.Metabolite levels were examined in bone marrow samples extracted from metformin or PBS -treated healthy (Wild type) and hyperglycemic (diabetic) mice using liquid chromatography-mass spectrometry (LC-MS)-based metabolomics. We applied an untargeted high performance LC-MS approach which combined multimode chromatography (ion exchange, reversed phase and hydrophilic interaction (HILIC)) and Orbitrap-based ultra-high accuracy mass spectrometry to achieve a wide coverage. A multivariate clustering was applied to reveal the global trends and major metabolite players.A total of 346 unique metabolites were identified, and they are grouped into distinctive clusters that reflected general and diabetes-specific responses to metformin. As evidenced by changes in the TCA and urea cycles, increased catabolism and nitrogen waste that are commonly associated with diabetes were rebalanced upon treatment with metformin. In particular, we found glutamate and succinate whose levels were drastically elevated in diabetic animals were brought back to normal levels by metformin. These two metabolites were further validated as the major targets of metformin in bone marrow stromal cells.Overall using limited sample size, our study revealed the metabolic pathways modulated by metformin in bones which have broad implication in our understanding of bone remodeling under hyperglycemia and in finding therapeutic interventions in mammals.

    View details for DOI 10.1371/journal.pone.0146152

    View details for Web of Science ID 000367510500137

    View details for PubMedCentralID PMC4696809

  • Integrated Proteomic and Genomic Analysis of Gastric Cancer Patient Tissues JOURNAL OF PROTEOME RESEARCH Yan, J. F., Kim, H., Jeong, S., Lee, H., Sethi, M. K., Lee, L. Y., Beavis, R. C., Im, H., Snyder, M. P., Hofree, M., Ideker, T., Wu, S., Paik, Y., Fanayan, S., Hancock, W. S. 2015; 14 (12): 4995-5006

    Abstract

    V-erb-b2 erythroblastic leukemia viral oncogene homologue 2, known as ERBB2, is an important oncogene in the development of certain cancers. It can form a heterodimer with other epidermal growth factor receptor family members and activate kinase-mediated downstream signaling pathways. ERBB2 gene is located on chromosome 17 and is amplified in a subset of cancers, such as breast, gastric, and colon cancer. Of particular interest to the Chromosome-Centric Human Proteome Project (C-HPP) initiative is the amplification mechanism that typically results in overexpression of a set of genes adjacent to ERBB2, which provides evidence of a linkage between gene location and expression. In this report we studied patient samples from ERBB2-positive together with adjacent control nontumor tissues. In addition, non-ERBB2-expressing patient samples were selected as comparison to study the effect of expression of this oncogene. We detected 196 proteins in ERBB2-positive patient tumor samples that had minimal overlap (29 proteins) with the non-ERBB2 tumor samples. Interaction and pathway analysis identified extracellular signal regulated kinase (ERK) cascade and actin polymerization and actinmyosin assembly contraction as pathways of importance in ERBB2+ and ERBB2- gastric cancer samples, respectively. The raw data files are deposited at ProteomeXchange (identifier: PXD002674) as well as GPMDB.

    View details for DOI 10.1021/acs.jproteome.5b00827

    View details for PubMedID 26435392

  • Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans GENOME RESEARCH Cenik, C., Cenik, E. S., Byeon, G. W., Grubert, F., Candille, S. I., Spacek, D., Alsallakh, B., Tilgner, H., Araya, C. L., Tang, H., Ricci, E., Snyder, M. P. 2015; 25 (11): 1610-1621

    Abstract

    Elucidating the consequences of genetic differences between humans is essential for understanding phenotypic diversity and personalized medicine. Although variation in RNA levels, transcription factor binding, and chromatin have been explored, little is known about global variation in translation and its genetic determinants. We used ribosome profiling, RNA sequencing, and mass spectrometry to perform an integrated analysis in lymphoblastoid cell lines from a diverse group of individuals. We find significant differences in RNA, translation, and protein levels suggesting diverse mechanisms of personalized gene expression control. Combined analysis of RNA expression and ribosome occupancy improves the identification of individual protein level differences. Finally, we identify genetic differences that specifically modulate ribosome occupancy--many of these differences lie close to start codons and upstream ORFs. Our results reveal a new level of gene expression variation among humans and indicate that genetic variants can cause changes in protein levels through effects on translation.

    View details for DOI 10.1101/gr.193342.115

    View details for Web of Science ID 000364355600003

    View details for PubMedID 26297486

    View details for PubMedCentralID PMC4617958

  • Design and Implementation of the International Genetics and Translational Research in Transplantation Network TRANSPLANTATION Keating, B. J., van Setten, J., Jacobson, P. A., Holmes, M. V., Verma, S. S., Chandrupatla, H. R., Nair, N., Gao, H., Li, Y. R., Chang, B., Wong, C., Phillips, R., Cole, B. S., Mukhtar, E., Zhang, W., Cao, H., Mohebnasab, M., Hou, C., Lee, T., Steel, L., Shaked, O., Garifallou, J., Miller, M. B., Karczewski, K. J., Akdere, A., Gonzalez, A., Lloyd, K. M., McGinn, D., Michaud, Z., Colasacco, A., Lek, M., Fu, Y., Pawashe, M., Guettouche, T., Himes, A., Perez, L., Guan, W., Wu, B., Schladt, D., Menon, M., Zhang, Z., Tragante, V., de Jonge, N., Otten, H. G., de Weger, R. A., van de Graaf, E. A., Baan, C. C., Manintveld, O. C., De Vlaminck, I., Piening, B. D., Strehl, C., Shaw, M., Snieder, H., Klintmalm, G. B., O'Leary, J. G., Amaral, S., Goldfarb, S., Rand, E., Rossano, J. W., Kohli, U., Heeger, P., Stahl, E., Christie, J. D., Fuentes, M. H., Levine, J. E., Aplenc, R., Schadt, E. E., Stranger, B. E., Kluin, J., Potena, L., Zuckermann, A., Khush, K., Alzahrani, A. J., Al-Muhanna, F. A., Al-Ali, A. K., Al-Ali, R., Al-Rubaish, A. M., Al-Mueilo, S., Byrne, E. M., Miller, D., Alexander, S. I., Onengut-Gumuscu, S., Rich, S. S., Suthanthiran, M., Tedesco, H., Saw, C. L., Ragoussis, J., Kfoury, A. G., Horne, B., Carlquist, J., Gerstein, M. B., Reindl-Schwaighofer, R., Oberbauer, R., Wijmenga, C., Palmer, S., Pereira, A. C., Segovia, J., Alonso-Pulpon, L. A., Comez-Bueno, M., Vilches, C., Jaramillo, N., de Borst, M. H., Naesens, M., Hao, K., MacArthur, D., Balasubramanian, S., Conlon, P. J., Lord, G. M., Ritchie, M. D., Snyder, M., Olthoff, K. M., Moore, J. H., Petersdorf, E. W., Kamoun, M., Wang, J., Monos, D. S., de Bakker, P. I., Hakonarson, H., Murphy, B., Lankree, M. B., Garcia-Pavia, P., Oetting, W. S., Birdwell, K. A., Bakker, S. J., Israni, A. K., Shaked, A., Asselbergs, F. W. 2015; 99 (11): 2401-2412

    Abstract

    Genetic association studies of transplantation outcomes have been hampered by small samples and highly complex multifactorial phenotypes, hindering investigations of the genetic architecture of a range of comorbidities which significantly impact graft and recipient life expectancy. We describe here the rationale and design of the International Genetics & Translational Research in Transplantation Network. The network comprises 22 studies to date, including 16494 transplant recipients and 11669 donors, of whom more than 5000 are of non-European ancestry, all of whom have existing genomewide genotype data sets.We describe the rich genetic and phenotypic information available in this consortium comprising heart, kidney, liver, and lung transplant cohorts.We demonstrate significant power in International Genetics & Translational Research in Transplantation Network to detect main effect association signals across regions such as the MHC region as well as genomewide for transplant outcomes that span all solid organs, such as graft survival, acute rejection, new onset of diabetes after transplantation, and for delayed graft function in kidney only.This consortium is designed and statistically powered to deliver pioneering insights into the genetic architecture of transplant-related outcomes across a range of different solid-organ transplant studies. The study design allows a spectrum of analyses to be performed including recipient-only analyses, donor-recipient HLA mismatches with focus on loss-of-function variants and nonsynonymous single nucleotide polymorphisms.

    View details for DOI 10.1097/TP.0000000000000913

    View details for Web of Science ID 000369087800037

    View details for PubMedCentralID PMC4623847

  • Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data PLOS GENETICS Dewey, F. E., Grove, M. E., Priest, J. R., Waggott, D., Batra, P., Miller, C. L., Wheeler, M., Zia, A., Pan, C., Karzcewski, K. J., Miyake, C., Whirl-Carrillo, M., Klein, T. E., Datta, S., Altman, R. B., Snyder, M., Quertermous, T., Ashley, E. A. 2015; 11 (10)

    Abstract

    High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

    View details for DOI 10.1371/journal.pgen.1005496

    View details for Web of Science ID 000364401600008

    View details for PubMedID 26448358

    View details for PubMedCentralID PMC4598191

  • Mango: a bias-correcting ChIA-PET analysis pipeline. Bioinformatics Phanstiel, D. H., Boyle, A. P., Heidari, N., Snyder, M. P. 2015; 31 (19): 3092-3098

    Abstract

    Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) is an established method for detecting genome-wide looping interactions at high resolution. Current ChIA-PET analysis software packages either fail to correct for non-specific interactions due to genomic proximity or only address a fraction of the steps required for data processing. We present Mango, a complete ChIA-PET data analysis pipeline that provides statistical confidence estimates for interactions and corrects for major sources of bias including differential peak enrichment and genomic proximity.Comparison to the existing software packages, ChIA-PET Tool and ChiaSig revealed that Mango interactions exhibit much better agreement with high-resolution Hi-C data. Importantly, Mango executes all steps required for processing ChIA-PET datasets, whereas ChiaSig only completes 20% of the required steps. Application of Mango to multiple available ChIA-PET datasets permitted the independent rediscovery of known trends in chromatin loops including enrichment of CTCF, RAD21, SMC3 and ZNF143 at the anchor regions of interactions and strong bias for convergent CTCF motifs.Mango is open source and distributed through github at https://github.com/dphansti/mango.mpsnyder@standford.eduSupplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btv336

    View details for PubMedID 26034063

    View details for PubMedCentralID PMC4592333

  • Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data. PLoS genetics Dewey, F. E., Grove, M. E., Priest, J. R., Waggott, D., Batra, P., Miller, C. L., Wheeler, M., Zia, A., Pan, C., Karzcewski, K. J., Miyake, C., Whirl-Carrillo, M., Klein, T. E., Datta, S., Altman, R. B., Snyder, M., Quertermous, T., Ashley, E. A. 2015; 11 (10)

    Abstract

    High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

    View details for DOI 10.1371/journal.pgen.1005496

    View details for PubMedID 26448358

  • Exome Sequencing of Neonatal Blood Spots and the Identification of Genes Implicated in Bronchopulmonary Dysplasia. American journal of respiratory and critical care medicine Li, J., Yu, K., Oehlert, J., Jeliffe-Pawlowski, L. L., Gould, J. B., Stevenson, D. K., Snyder, M., Shaw, G. M., O'Brodovich, H. M. 2015; 192 (5): 589-596

    Abstract

    Bronchopulmonary dysplasia (BPD), a prevalent severe lung disease of premature infants, has a strong genetic component. Large-scale genome-wide association studies for common variants have not revealed its genetic basis.Given the historical high mortality rate of extremely preterm infants who now survive and develop BPD, we hypothesized that risk loci underlying this disease are under severe purifying selection during evolution; thus, rare variants likely explain greater risk of the disease.We performed exome sequencing on 50 BPD-affected and unaffected twin pairs using DNA isolated from neonatal blood spots and identified genes affected by extremely rare nonsynonymous mutations. Functional genomic approaches were then used to systematically compare these affected genes.We identified 258 genes with rare nonsynonymous mutations in patients with BPD. These genes were highly enriched for processes involved in pulmonary structure and function including collagen fibril organization, morphogenesis of embryonic epithelium, and regulation of Wnt signaling pathway; displayed significantly elevated expression in fetal and adult lungs; and were substantially up-regulated in a murine model of BPD. Analyses of mouse mutants revealed their phenotypic enrichment for embryonic development and the cyanosis phenotype, a clinical manifestation of BPD.Our study supports the role of rare variants in BPD, in contrast with the role of common variants targeted by genome-wide association studies. Overall, our study is the first to sequence BPD exomes from newborn blood spot samples and identify with high confidence genes implicated in BPD, thereby providing important insights into its biology and molecular etiology.

    View details for DOI 10.1164/rccm.201501-0168OC

    View details for PubMedID 26030808

  • Genomic analysis of mycosis fungoides and Sézary syndrome identifies recurrent alterations in TNFR2. Nature genetics Ungewickell, A., Bhaduri, A., Rios, E., Reuter, J., Lee, C. S., Mah, A., Zehnder, A., Ohgami, R., Kulkarni, S., Armstrong, R., Weng, W., Gratzinger, D., Tavallaee, M., Rook, A., Snyder, M., Kim, Y., Khavari, P. A. 2015; 47 (9): 1056-1060

    Abstract

    Mycosis fungoides and Sézary syndrome comprise the majority of cutaneous T cell lymphomas (CTCLs), disorders notable for their clinical heterogeneity that can present in skin or peripheral blood. Effective treatment options for CTCL are limited, and the genetic basis of these T cell lymphomas remains incompletely characterized. Here we report recurrent point mutations and genomic gains of TNFRSF1B, encoding the tumor necrosis factor receptor TNFR2, in 18% of patients with mycosis fungoides and Sézary syndrome. Expression of the recurrent TNFR2 Thr377Ile mutant in T cells leads to enhanced non-canonical NF-κB signaling that is sensitive to the proteasome inhibitor bortezomib. Using an integrative genomic approach, we additionally discovered a recurrent CTLA4-CD28 fusion, as well as mutations in downstream signaling mediators of these receptors.

    View details for DOI 10.1038/ng.3370

    View details for PubMedID 26258847

  • Evaluating Common Humoral Responses against Fungal Infections with Yeast Protein Microarrays JOURNAL OF PROTEOME RESEARCH Coelho, P. S., Im, H., Clemons, K. V., Snyder, M. P., Stevens, D. A. 2015; 14 (9): 3924-3931

    Abstract

    We profiled the global immunoglobulin response against fungal infection by using yeast protein microarrays. Groups of CD-1 mice were infected systemically with human fungal pathogens (Coccidioides posadasii, Candida albicans, or Paracoccidioides brasiliensis) or inoculated with PBS as a control. Another group was inoculated with heat-killed yeast (HKY) of Saccharomyces cerevisiae. After 30 days, serum from mice in the groups were collected and used to probe S. cerevisiae protein microarrays containing 4800 full-length glutathione S-transferase (GST)-fusion proteins. Antimouse IgG conjugated with Alexafluor 555 and anti-GST antibody conjugated with Alexafluor 647 were used to detect antibody-antigen interactions and the presence of GST-fusion proteins, respectively. Serum after infection with C. albicans reacted with 121 proteins: C. posadasii, 81; P. brasiliensis, 67; and after HKY, 63 proteins on the yeast protein microarray, respectively. We identified a set of 16 antigenic proteins that were shared across the three fungal pathogens. These include retrotransposon capsid proteins, heat shock proteins, and mitochondrial proteins. Five of these proteins were identified in our previous study of fungal cell wall by mass spectrometry (Ann. N. Y. Acad. Sci. 2012, 1273, 44-51). The results obtained give a comprehensive view of the immunological responses to fungal infections at the proteomic level. They also offer insight into immunoreactive protein commonality among several fungal pathogens and provide a basis for a panfungal vaccine.

    View details for DOI 10.1021/acs.jproteome.5b00365

    View details for PubMedID 26258609

  • RNA Sequencing Analysis Detection of a Novel Pathway of Endothelial Dysfunction in Pulmonary Arterial Hypertension AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE Rhodes, C. J., Im, H., Cao, A., Hennigs, J. K., Wang, L., Sa, S., Chen, P., Nickel, N. P., Miyagawa, K., Hopper, R. K., Tojais, N. F., Li, C. G., Gu, M., Spiekerkoetter, E., Xian, Z., Chen, R., Zhao, M., Kaschwich, M., del Rosario, P. A., Bernstein, D., Zamanian, R. T., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2015; 192 (3): 356-366

    Abstract

    Pulmonary arterial hypertension is characterized by endothelial dysregulation, but global changes in gene expression have not been related to perturbations in function.RNA sequencing was utilized to discriminate changes in transcriptomes of endothelial cells cultured from lungs of patients with idiopathic pulmonary arterial hypertension vs. controls and to assess the functional significance of major differentially expressed transcripts.The endothelial transcriptomes from seven control and six idiopathic pulmonary arterial hypertension patients' lungs were analyzed. Differentially expressed genes were related to BMPR2 signaling. Those downregulated were assessed for function in cultured cells, and in a transgenic mouse.Fold-differences in ten genes were significant (p<0.05), four increased and six decreased in patients vs.No patient was mutant for BMPR2. However, knockdown of BMPR2 by siRNA in control pulmonary arterial endothelial cells recapitulated six/ten patient-related gene changes, including decreased collagen IV (COL4A1, COL4A2) and ephrinA1 (EFNA1). Reduction of BMPR2 regulated transcripts was related to decreased β-catenin. Reducing COL4A1, COL4A2 and EFNA1 by siRNA inhibited pulmonary endothelial adhesion, migration and tube formation. In mice null for the EFNA1 receptor, EphA2, vs. controls, VEGF receptor blockade and hypoxia caused more severe pulmonary hypertension, judged by elevated right ventricular systolic pressure, right ventricular hypertrophy and loss of small arteries.The novel relationship between BMPR2 dysfunction and reduced expression of endothelial COL4 and EFNA1 may underlie vulnerability to injury in pulmonary arterial hypertension.

    View details for DOI 10.1164/rccm.201408-1528OC

    View details for PubMedID 26030479

  • Probing High-density Functional Protein Microarrays to Detect Protein-protein Interactions JOVE-JOURNAL OF VISUALIZED EXPERIMENTS Fasolo, J., Im, H., Snyder, M. P. 2015

    Abstract

    High-density functional protein microarrays containing ~4,200 recombinant yeast proteins are examined for kinase protein-protein interactions using an affinity purified yeast kinase fusion protein containing a V5-epitope tag for read-out. Purified kinase is obtained through culture of a yeast strain optimized for high copy protein production harboring a plasmid containing a Kinase-V5 fusion construct under a GAL inducible promoter. The yeast is grown in restrictive media with a neutral carbon source for 6 hr followed by induction with 2% galactose. Next, the culture is harvested and kinase is purified using standard affinity chromatographic techniques to obtain a highly purified protein kinase for use in the assay. The purified kinase is diluted with kinase buffer to an appropriate range for the assay and the protein microarrays are blocked prior to hybridization with the protein microarray. After the hybridization, the arrays are probed with monoclonal V5 antibody to identify proteins bound by the kinase-V5 protein. Finally, the arrays are scanned using a standard microarray scanner, and data is extracted for downstream informatics analysis to determine a high confidence set of protein interactions for downstream validation in vivo.

    View details for DOI 10.3791/51872

    View details for Web of Science ID 000361537100003

    View details for PubMedID 26274875

    View details for PubMedCentralID PMC4545172

  • Single-cell chromatin accessibility reveals principles of regulatory variation NATURE Buenostro, J. D., Wu, B., Litzenburger, U. M., Ruff, D., Gonzales, M. L., Snyder, M. P., Chang, H. Y., Greenleaf, W. J. 2015; 523 (7561): 486-U264

    Abstract

    Cell-to-cell variation is a universal feature of life that affects a wide range of biological phenomena, from developmental plasticity to tumour heterogeneity. Although recent advances have improved our ability to document cellular phenotypic variation, the fundamental mechanisms that generate variability from identical DNA sequences remain elusive. Here we reveal the landscape and principles of mammalian DNA regulatory variation by developing a robust method for mapping the accessible genome of individual cells by assay for transposase-accessible chromatin using sequencing (ATAC-seq) integrated into a programmable microfluidics platform. Single-cell ATAC-seq (scATAC-seq) maps from hundreds of single cells in aggregate closely resemble accessibility profiles from tens of millions of cells and provide insights into cell-to-cell variation. Accessibility variance is systematically associated with specific trans-factors and cis-elements, and we discover combinations of trans-factors associated with either induction or suppression of cell-to-cell variability. We further identify sets of trans-factors associated with cell-type-specific accessibility variance across eight cell types. Targeted perturbations of cell cycle or transcription factor signalling evoke stimulus-specific changes in this observed variability. The pattern of accessibility variation in cis across the genome recapitulates chromosome compartments de novo, linking single-cell accessibility variation to three-dimensional genome organization. Single-cell analysis of DNA accessibility provides new insight into cellular variation of the 'regulome'.

    View details for DOI 10.1038/nature14590

    View details for Web of Science ID 000358378900042

  • Single-cell chromatin accessibility reveals principles of regulatory variation. Nature Buenrostro, J. D., Wu, B., Litzenburger, U. M., Ruff, D., Gonzales, M. L., Snyder, M. P., Chang, H. Y., Greenleaf, W. J. 2015; 523 (7561): 486-490

    Abstract

    Cell-to-cell variation is a universal feature of life that affects a wide range of biological phenomena, from developmental plasticity to tumour heterogeneity. Although recent advances have improved our ability to document cellular phenotypic variation, the fundamental mechanisms that generate variability from identical DNA sequences remain elusive. Here we reveal the landscape and principles of mammalian DNA regulatory variation by developing a robust method for mapping the accessible genome of individual cells by assay for transposase-accessible chromatin using sequencing (ATAC-seq) integrated into a programmable microfluidics platform. Single-cell ATAC-seq (scATAC-seq) maps from hundreds of single cells in aggregate closely resemble accessibility profiles from tens of millions of cells and provide insights into cell-to-cell variation. Accessibility variance is systematically associated with specific trans-factors and cis-elements, and we discover combinations of trans-factors associated with either induction or suppression of cell-to-cell variability. We further identify sets of trans-factors associated with cell-type-specific accessibility variance across eight cell types. Targeted perturbations of cell cycle or transcription factor signalling evoke stimulus-specific changes in this observed variability. The pattern of accessibility variation in cis across the genome recapitulates chromosome compartments de novo, linking single-cell accessibility variation to three-dimensional genome organization. Single-cell analysis of DNA accessibility provides new insight into cellular variation of the 'regulome'.

    View details for DOI 10.1038/nature14590

    View details for PubMedID 26083756

  • Achieving high-sensitivity for clinical applications using augmented exome sequencing GENOME MEDICINE Patwardhan, A., Harris, J., Leng, N., Bartha, G., Church, D. M., Luo, S., Haudenschild, C., Pratt, M., Zook, J., Salit, M., Tirch, J., Morra, M., Chervitz, S., Li, M., Clark, M., Garcia, S., Chandratillake, G., Kirk, S., Ashley, E., Snyder, M., Altman, R., Bustamante, C., Butte, A. J., West, J., Chen, R. 2015; 7

    Abstract

    Whole exome sequencing is increasingly used for the clinical evaluation of genetic disease, yet the variation of coverage and sensitivity over medically relevant parts of the genome remains poorly understood. Several sequencing-based assays continue to provide coverage that is inadequate for clinical assessment.Using sequence data obtained from the NA12878 reference sample and pre-defined lists of medically-relevant protein-coding and noncoding sequences, we compared the breadth and depth of coverage obtained among four commercial exome capture platforms and whole genome sequencing. In addition, we evaluated the performance of an augmented exome strategy, ACE, that extends coverage in medically relevant regions and enhances coverage in areas that are challenging to sequence. Leveraging reference call-sets, we also examined the effects of improved coverage on variant detection sensitivity.We observed coverage shortfalls with each of the conventional exome-capture and whole-genome platforms across several medically interpretable genes. These gaps included areas of the genome required for reporting recently established secondary findings (ACMG) and known disease-associated loci. The augmented exome strategy recovered many of these gaps, resulting in improved coverage in these areas. At clinically-relevant coverage levels (100 % bases covered at ≥20×), ACE improved coverage among genes in the medically interpretable genome (>90 % covered relative to 10-78 % with other platforms), the set of ACMG secondary finding genes (91 % covered relative to 4-75 % with other platforms) and a subset of variants known to be associated with human disease (99 % covered relative to 52-95 % with other platforms). Improved coverage translated into improvements in sensitivity, with ACE variant detection sensitivities (>97.5 % SNVs, >92.5 % InDels) exceeding that observed with conventional whole-exome and whole-genome platforms.Clinicians should consider analytical performance when making clinical assessments, given that even a few missed variants can lead to reporting false negative results. An augmented exome strategy provides a level of coverage not achievable with other platforms, thus addressing concerns regarding the lack of sensitivity in clinically important regions. In clinical applications where comprehensive coverage of medically interpretable areas of the genome requires higher localized sequencing depth, an augmented exome approach offers both cost and performance advantages over other sequencing-based tests.

    View details for DOI 10.1186/s13073-015-0197-4

    View details for Web of Science ID 000359428300001

    View details for PubMedID 26269718

    View details for PubMedCentralID PMC4534066

  • Recurrent somatic mutations in regulatory regions of human cancer genomes NATURE GENETICS Melton, C., Reuter, J. A., Spacek, D. V., Snyder, M. 2015; 47 (7): 710-?

    Abstract

    Aberrant regulation of gene expression in cancer can promote survival and proliferation of cancer cells. Here we integrate whole-genome sequencing data from The Cancer Genome Atlas (TCGA) for 436 patients from 8 cancer subtypes with ENCODE and other regulatory annotations to identify point mutations in regulatory regions. We find evidence for positive selection of mutations in transcription factor binding sites, consistent with these sites regulating important cancer cell functions. Using a new method that adjusts for sample- and genomic locus-specific mutation rates, we identify recurrently mutated sites across individuals with cancer. Mutated regulatory sites include known sites in the TERT promoter and many new sites, including a subset in proximity to cancer-related genes. In reporter assays, two new sites display decreased enhancer activity upon mutation. These data demonstrate that many regulatory regions contain mutations under selective pressure and suggest a greater role for regulatory mutations in cancer than previously appreciated.

    View details for DOI 10.1038/ng.3332

    View details for Web of Science ID 000357090300007

    View details for PubMedID 26053494

    View details for PubMedCentralID PMC4485503

  • Where Next for Genetics and Genomics? PLoS biology Tyler-Smith, C., Yang, H., Landweber, L. F., Dunham, I., Knoppers, B. M., Donnelly, P., Mardis, E. R., Snyder, M., McVean, G. 2015; 13 (7): e1002216

    Abstract

    The last few decades have utterly transformed genetics and genomics, but what might the next ten years bring? PLOS Biology asked eight leaders spanning a range of related areas to give us their predictions. Without exception, the predictions are for more data on a massive scale and of more diverse types. All are optimistic and predict enormous positive impact on scientific understanding, while a recurring theme is the benefit of such data for the transformation and personalization of medicine. Several also point out that the biggest changes will very likely be those that we don't foresee, even now.

    View details for DOI 10.1371/journal.pbio.1002216

    View details for PubMedID 26225775

    View details for PubMedCentralID PMC4520474

  • Where Next for Genetics and Genomics? PLOS BIOLOGY Tyler-Smith, C., Yang, H., Landweber, L. F., Dunham, I., Knoppers, B. M., Donnelly, P., Mardis, E. R., Snyder, M., McVean, G. 2015; 13 (7)

    Abstract

    The last few decades have utterly transformed genetics and genomics, but what might the next ten years bring? PLOS Biology asked eight leaders spanning a range of related areas to give us their predictions. Without exception, the predictions are for more data on a massive scale and of more diverse types. All are optimistic and predict enormous positive impact on scientific understanding, while a recurring theme is the benefit of such data for the transformation and personalization of medicine. Several also point out that the biggest changes will very likely be those that we don't foresee, even now.

    View details for DOI 10.1371/journal.pbio.1002216

    View details for Web of Science ID 000360617100023

    View details for PubMedCentralID PMC4520474

  • Metabolome progression during early gut microbial colonization of gnotobiotic mice SCIENTIFIC REPORTS Marcobal, A., Yusufaly, T., Higginbottom, S., Snyder, M., Sonnenburg, J. L., Mias, G. I. 2015; 5

    Abstract

    The microbiome has been implicated directly in host health, especially host metabolic processes and development of immune responses. These are particularly important in infants where the gut first begins being colonized, and such processes may be modeled in mice. In this investigation we follow longitudinally the urine metabolome of ex-germ-free mice, which are colonized with two bacterial species, Bacteroides thetaiotaomicron and Bifidobacterium longum. High-throughput mass spectrometry profiling of urine samples revealed dynamic changes in the metabolome makeup, associated with the gut bacterial colonization, enabled by our adaptation of non-linear time-series analysis to urine metabolomics data. Results demonstrate both gradual and punctuated changes in metabolite production and that early colonization events profoundly impact the nature of small molecules circulating in the host. The identified small molecules are implicated in amino acid and carbohydrate metabolic processes, and offer insights into the dynamic changes occurring during the colonization process, using high-throughput longitudinal methodology.

    View details for DOI 10.1038/srep11589

    View details for Web of Science ID 000357041700001

    View details for PubMedID 26118551

    View details for PubMedCentralID PMC4484351

  • Transglutaminase 4 as a prostate autoantigen in male subfertility SCIENCE TRANSLATIONAL MEDICINE Landegren, N., Sharon, D., Shum, A. K., Khan, I. S., Fasano, K. J., Hallgren, A., Kampf, C., Freyhult, E., Ardesjo-Lundgren, B., Alimohammadi, M., Rathsman, S., Ludvigsson, J. F., Lundh, D., Motrich, R., Rivero, V., Fong, L., Giwercman, A., Gustafsson, J., Perheentupa, J., Husebye, E. S., Anderson, M. S., Snyder, M., Kampe, O. 2015; 7 (292)

    Abstract

    Autoimmune polyendocrine syndrome type 1 (APS1), a monogenic disorder caused by AIRE gene mutations, features multiple autoimmune disease components. Infertility is common in both males and females with APS1. Although female infertility can be explained by autoimmune ovarian failure, the mechanisms underlying male infertility have remained poorly understood. We performed a proteome-wide autoantibody screen in APS1 patient sera to assess the autoimmune response against the male reproductive organs. By screening human protein arrays with male and female patient sera and by selecting for gender-imbalanced autoantibody signals, we identified transglutaminase 4 (TGM4) as a male-specific autoantigen. Notably, TGM4 is a prostatic secretory molecule with critical role in male reproduction. TGM4 autoantibodies were detected in most of the adult male APS1 patients but were absent in all the young males. Consecutive serum samples further revealed that TGM4 autoantibodies first presented during pubertal age and subsequent to prostate maturation. We assessed the animal model for APS1, the Aire-deficient mouse, and found spontaneous development of TGM4 autoantibodies specifically in males. Aire-deficient mice failed to present TGM4 in the thymus, consistent with a defect in central tolerance for TGM4. In the mouse, we further link TGM4 immunity with a destructive prostatitis and compromised secretion of TGM4. Collectively, our findings in APS1 patients and Aire-deficient mice reveal prostate autoimmunity as a major manifestation of APS1 with potential role in male subfertility.

    View details for DOI 10.1126/scitranslmed.aaa9186

    View details for PubMedID 26084804

  • Transcriptome Signature and Regulation in Human Somatic Cell Reprogramming STEM CELL REPORTS Tanaka, Y., Hysolli, E., Su, J., Xiang, Y., Kim, K., Zhong, M., Li, Y., Heydari, K., Euskirchen, G., Snyder, M. P., Pan, X., Weissman, S. M., Park, I. 2015; 4 (6): 1125-1139

    Abstract

    Reprogramming of somatic cells produces induced pluripotent stem cells (iPSCs) that are invaluable resources for biomedical research. Here, we extended the previous transcriptome studies by performing RNA-seq on cells defined by a combination of multiple cellular surface markers. We found that transcriptome changes during early reprogramming occur independently from the opening of closed chromatin by OCT4, SOX2, KLF4, and MYC (OSKM). Furthermore, our data identify multiple spliced forms of genes uniquely expressed at each progressive stage of reprogramming. In particular, we found a pluripotency-specific spliced form of CCNE1 that is specific to human and significantly enhances reprogramming. In addition, single nucleotide polymorphism (SNP) expression analysis reveals that monoallelic gene expression is induced in the intermediate stages of reprogramming, while biallelic expression is recovered upon completion of reprogramming. Our transcriptome data provide unique opportunities in understanding human iPSC reprogramming.

    View details for DOI 10.1016/j.stemcr.2015.04.009

    View details for Web of Science ID 000356068100017

    View details for PubMedID 26004630

    View details for PubMedCentralID PMC4471828

  • Optimized Analytical Procedures for the Untargeted Metabolomic Profiling of Human Urine and Plasma by Combining Hydrophilic Interaction (HILIC) and Reverse-Phase Liquid Chromatography (RPLC)-Mass Spectrometry MOLECULAR & CELLULAR PROTEOMICS Contrepois, K., Jiang, L., Snyder, M. 2015; 14 (6): 1684-1695

    Abstract

    Profiling of body fluids is crucial for monitoring and discovering metabolic markers of health and disease and for providing insights into human physiology. Since human urine and plasma each contain an extreme diversity of metabolites, a single liquid chromatographic system when coupled to mass spectrometry (MS) is not sufficient to achieve reasonable metabolome coverage. Hydrophilic interaction liquid chromatography (HILIC) offers complementary information to reverse-phase liquid chromatography (RPLC) by retaining polar metabolites. With the objective of finding the optimal combined chromatographic solution to profile urine and plasma, we systematically investigated the performance of five HILIC columns with different chemistries operated at three different pH (acidic, neutral, basic) and five C18-silica RPLC columns. The zwitterionic column ZIC-HILIC operated at neutral pH provided optimal performance on a large set of hydrophilic metabolites. The RPLC columns Hypersil GOLD and Zorbax SB aq were proven to be best suited for the metabolic profiling of urine and plasma, respectively. Importantly, the optimized HILIC-MS method showed excellent intrabatch peak area reproducibility (CV < 12%) and good long-term interbatch (40 days) peak area reproducibility (CV < 22%) that were similar to those of RPLC-MS procedures. Finally, combining the optimal HILIC- and RPLC-MS approaches greatly expanded metabolome coverage with 44% and 108% new metabolic features detected compared with RPLC-MS alone for urine and plasma, respectively. The proposed combined LC-MS approaches improve the comprehensiveness of global metabolic profiling of body fluids and thus are valuable for monitoring and discovering metabolic changes associated with health and disease in clinical research studies.

    View details for DOI 10.1074/mcp.M114.046508

    View details for Web of Science ID 000355550400019

    View details for PubMedID 25787789

    View details for PubMedCentralID PMC4458729

  • AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae (vol 10, e0120671, 2015) PLOS ONE Song, G., Dickins, B. A., Demeter, J., Engel, S., Gallagher, J., Choe, K., Dunn, B., Snyder, M., Cherry, J. 2015; 10 (5): e0129184

    View details for DOI 10.1371/journal.pone.0129184

    View details for Web of Science ID 000355185600125

    View details for PubMedID 26017550

    View details for PubMedCentralID PMC4446291

  • High-Throughput Sequencing Technologies MOLECULAR CELL Reuter, J. A., Spacek, D. V., Snyder, M. P. 2015; 58 (4): 586-597

    Abstract

    The human genome sequence has profoundly altered our understanding of biology, human diversity, and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past 10 years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them, as well as the challenges facing current sequencing platforms and their clinical application.

    View details for DOI 10.1016/j.molcel.2015.05.004

    View details for Web of Science ID 000355154000007

    View details for PubMedCentralID PMC4494749

  • High-throughput sequencing technologies. Molecular cell Reuter, J. A., Spacek, D. V., Snyder, M. P. 2015; 58 (4): 586-597

    Abstract

    The human genome sequence has profoundly altered our understanding of biology, human diversity, and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past 10 years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them, as well as the challenges facing current sequencing platforms and their clinical application.

    View details for DOI 10.1016/j.molcel.2015.05.004

    View details for PubMedID 26000844

  • Characterization of Novel Transcripts in Pseudorabies Virus VIRUSES-BASEL Tombacz, D., Csabai, Z., Olah, P., Havelda, Z., Sharon, D., Snyder, M., Boldogkoi, Z. 2015; 7 (5): 2727-2744

    Abstract

    In this study we identified two 3'-coterminal RNA molecules in the pseudorabies virus. The highly abundant short transcript (CTO-S) proved to be encoded between the ul21 and ul22 genes in close vicinity of the replication origin (OriL) of the virus. The less abundant long RNA molecule (CTO-L) is a transcriptional readthrough product of the ul21 gene and overlaps OriL. These polyadenylated RNAs were characterized by ascertaining their nucleotide sequences with the Illumina HiScanSQ and Pacific Biosciences Real-Time (PacBio RSII) sequencing platforms and by analyzing their transcription kinetics through use of multi-time-point Real-Time RT-PCR and the PacBio RSII system. It emerged that transcription of the CTOs is fully dependent on the viral transactivator protein IE180 and CTO-S is not a microRNA precursor. We propose an interaction between the transcription and replication machineries at this genomic location, which might play an important role in the regulation of DNA synthesis.

    View details for DOI 10.3390/v7052727

    View details for Web of Science ID 000356228700027

    View details for PubMedID 26008709

    View details for PubMedCentralID PMC4452928

  • Impact of allele-specific peptides in proteome quantification PROTEOMICS CLINICAL APPLICATIONS Wu, L., Snyder, M. 2015; 9 (3-4): 432-436

    Abstract

    MS-based proteome technologies have greatly improved our ability to detect and quantify proteomes across various biological samples. High throughput bottom-up proteome profiling in combination with targeted MS method, e.g. SRM assay, is emerging as a powerful approach in the field of biomarker discovery. In the past few years, increasing number of studies have attempted to integrate genomic and proteomic data for biomarker discovery. Here, we describe how allele-specific peptide can be applied in biomarker discovery and their impact in protein quantification.

    View details for DOI 10.1002/prca.201400126

    View details for Web of Science ID 000353291000019

    View details for PubMedID 25676416

    View details for PubMedCentralID PMC4448739

  • Reassessment of Piwi Binding to the Genome and Piwi Impact on RNA Polymerase II Distribution DEVELOPMENTAL CELL Lin, H., Chen, M., Kundaje, A., Valouev, A., Yin, H., Liu, N., Neuenkirchen, N., Zhong, M., Snyder, M. 2015; 32 (6): 772-774

    Abstract

    Drosophila Piwi was reported by Huang et al. (2013) to be guided by piRNAs to piRNA-complementary sites in the genome, which then recruits heterochromatin protein 1a and histone methyltransferase Su(Var)3-9 to the sites. Among additional findings, Huang et al. (2013) also reported Piwi binding sites in the genome and the reduction of RNA polymerase II in euchromatin but its increase in pericentric regions in piwi mutants. Marinov et al. (2015) disputed the validity of the Huang et al. bioinformatic pipeline that led to the last two claims. Here we report our independent reanalysis of the data using current bioinformatic methods. Our reanalysis agrees with Marinov et al. (2015) that Piwi's genomic targets still remain to be identified but confirms the Huang et al. claim that Piwi influences RNA polymerase II distribution in the genome. This Matters Arising Response addresses the Marinov et al. (2015) Matters Arising, published concurrently in this issue of Developmental Cell.

    View details for DOI 10.1016/j.devcel.2015.03.004

    View details for PubMedID 25805139

  • The conserved histone deacetylase Rpd3 and its DNA binding subunit Ume6 control dynamic transcript architecture during mitotic growth and meiotic development NUCLEIC ACIDS RESEARCH Lardenois, A., Stuparevic, I., Liu, Y., Law, M. J., Becker, E., Smagulova, F., Waern, K., Guilleux, M., Horecka, J., Chu, A., Kervarrec, C., Strich, R., Snyder, M., Davis, R. W., Steinmetz, L. M., Primig, M. 2015; 43 (1): 115-128

    Abstract

    It was recently reported that the sizes of many mRNAs change when budding yeast cells exit mitosis and enter the meiotic differentiation pathway. These differences were attributed to length variations of their untranslated regions. The function of UTRs in protein translation is well established. However, the mechanism controlling the expression of distinct transcript isoforms during mitotic growth and meiotic development is unknown. In this study, we order developmentally regulated transcript isoforms according to their expression at specific stages during meiosis and gametogenesis, as compared to vegetative growth and starvation. We employ regulatory motif prediction, in vivo protein-DNA binding assays, genetic analyses and monitoring of epigenetic amino acid modification patterns to identify a novel role for Rpd3 and Ume6, two components of a histone deacetylase complex already known to repress early meiosis-specific genes in dividing cells, in mitotic repression of meiosis-specific transcript isoforms. Our findings classify developmental stage-specific early, middle and late meiotic transcript isoforms, and they point to a novel HDAC-dependent control mechanism for flexible transcript architecture during cell growth and differentiation. Since Rpd3 is highly conserved and ubiquitously expressed in many tissues, our results are likely relevant for development and disease in higher eukaryotes.

    View details for DOI 10.1093/nar/gku1185

    View details for Web of Science ID 000350207100017

    View details for PubMedID 25477386

    View details for PubMedCentralID PMC4288150

  • Disease Variant Landscape of a Large Multiethnic Population of Moyamoya Patients by Exome Sequencing. G3 (Bethesda, Md.) Shoemaker, L. D., Clark, M. J., Patwardhan, A., Chandratillake, G., Garcia, S., Chen, R., Morgan, A. A., Leng, N., Kirk, S., Chen, R., Cook, D. J., Snyder, M., Steinberg, G. K. 2015; 6 (1): 41-49

    Abstract

    Moyamoya disease (MMD) is a rare disorder characterized by cerebrovascular occlusion and development of hemorrhage-prone collateral vessels. Approximately 10-12% of cases are familial, with a presumed low penetrance autosomal dominant pattern of inheritance. Diagnosis commonly occurs only after clinical presentation. The recent identification of the RNF213 founder mutation (p.R4810K) in the Asian population has made a significant contribution, but the etiology of this disease remains unclear. To further develop the variant landscape of MMD, we performed high-depth whole exome sequencing of 125 unrelated, predominantly nonfamilial, ethnically diverse MMD patients in parallel with 125 internally sequenced, matched controls using the same exome and analysis platform. Three subpopulations were established: Asian, Caucasian, and non-RNF213 founder mutation cases. We provided additional support for the previously observed RNF213 founder mutation (p.R4810K) in Asian cases (P = 6.01×10(-5)) that was enriched among East Asians compared to Southeast Asian and Pacific Islander cases (P = 9.52×10(-4)) and was absent in all Caucasian cases. The most enriched variant in Caucasian (P = 7.93×10(-4)) and non-RNF213 founder mutation (P = 1.51×10(-3)) cases was ZXDC (p.P562L), a gene involved in MHC Class II activation. Collapsing variant methodology ranked OBSCN, a gene involved in myofibrillogenesis, as most enriched in Caucasian (P = 1.07×10(-4)) and non-RNF213 founder mutation cases (P = 5.31×10(-5)). These findings further support the East Asian origins of the RNF213 (p.R4810K) variant and more fully describe the genetic landscape of multiethnic MMD, revealing novel, alternative candidate variants and genes that may be important in MMD etiology and diagnosis.

    View details for DOI 10.1534/g3.115.020321

    View details for PubMedID 26530418

  • AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae. PloS one Song, G., Dickins, B. J., Demeter, J., Engel, S., Gallagher, J., Choe, K., Dunn, B., Snyder, M., Cherry, J. M. 2015; 10 (3)

    Abstract

    The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

    View details for DOI 10.1371/journal.pone.0120671

    View details for PubMedID 25781462

  • Metformin Improves Diabetic Bone Health by Re-Balancing Catabolism and Nitrogen Disposal. PloS one Li, X., Guo, Y., Yan, W., Snyder, M. P., Li, X. 2015; 10 (12)

    Abstract

    Metformin, a leading drug used to treat diabetic patients, is reported to benefit bone homeostasis under hyperglycemia in animal models. However, both the molecular targets and the biological pathways affected by metformin in bone are not well identified or characterized. The objective of this study is to investigate the bioengergeric pathways affected by metformin in bone marrow cells of mice.Metabolite levels were examined in bone marrow samples extracted from metformin or PBS -treated healthy (Wild type) and hyperglycemic (diabetic) mice using liquid chromatography-mass spectrometry (LC-MS)-based metabolomics. We applied an untargeted high performance LC-MS approach which combined multimode chromatography (ion exchange, reversed phase and hydrophilic interaction (HILIC)) and Orbitrap-based ultra-high accuracy mass spectrometry to achieve a wide coverage. A multivariate clustering was applied to reveal the global trends and major metabolite players.A total of 346 unique metabolites were identified, and they are grouped into distinctive clusters that reflected general and diabetes-specific responses to metformin. As evidenced by changes in the TCA and urea cycles, increased catabolism and nitrogen waste that are commonly associated with diabetes were rebalanced upon treatment with metformin. In particular, we found glutamate and succinate whose levels were drastically elevated in diabetic animals were brought back to normal levels by metformin. These two metabolites were further validated as the major targets of metformin in bone marrow stromal cells.Overall using limited sample size, our study revealed the metabolic pathways modulated by metformin in bones which have broad implication in our understanding of bone remodeling under hyperglycemia and in finding therapeutic interventions in mammals.

    View details for DOI 10.1371/journal.pone.0146152

    View details for PubMedID 26716870

  • Achieving high-sensitivity for clinical applications using augmented exome sequencing. Genome medicine Patwardhan, A., Harris, J., Leng, N., Bartha, G., Church, D. M., Luo, S., Haudenschild, C., Pratt, M., Zook, J., Salit, M., Tirch, J., Morra, M., Chervitz, S., Li, M., Clark, M., Garcia, S., Chandratillake, G., Kirk, S., Ashley, E., Snyder, M., Altman, R., Bustamante, C., Butte, A. J., West, J., Chen, R. 2015; 7 (1): 71-?

    Abstract

    Whole exome sequencing is increasingly used for the clinical evaluation of genetic disease, yet the variation of coverage and sensitivity over medically relevant parts of the genome remains poorly understood. Several sequencing-based assays continue to provide coverage that is inadequate for clinical assessment.Using sequence data obtained from the NA12878 reference sample and pre-defined lists of medically-relevant protein-coding and noncoding sequences, we compared the breadth and depth of coverage obtained among four commercial exome capture platforms and whole genome sequencing. In addition, we evaluated the performance of an augmented exome strategy, ACE, that extends coverage in medically relevant regions and enhances coverage in areas that are challenging to sequence. Leveraging reference call-sets, we also examined the effects of improved coverage on variant detection sensitivity.We observed coverage shortfalls with each of the conventional exome-capture and whole-genome platforms across several medically interpretable genes. These gaps included areas of the genome required for reporting recently established secondary findings (ACMG) and known disease-associated loci. The augmented exome strategy recovered many of these gaps, resulting in improved coverage in these areas. At clinically-relevant coverage levels (100 % bases covered at ≥20×), ACE improved coverage among genes in the medically interpretable genome (>90 % covered relative to 10-78 % with other platforms), the set of ACMG secondary finding genes (91 % covered relative to 4-75 % with other platforms) and a subset of variants known to be associated with human disease (99 % covered relative to 52-95 % with other platforms). Improved coverage translated into improvements in sensitivity, with ACE variant detection sensitivities (>97.5 % SNVs, >92.5 % InDels) exceeding that observed with conventional whole-exome and whole-genome platforms.Clinicians should consider analytical performance when making clinical assessments, given that even a few missed variants can lead to reporting false negative results. An augmented exome strategy provides a level of coverage not achievable with other platforms, thus addressing concerns regarding the lack of sensitivity in clinically important regions. In clinical applications where comprehensive coverage of medically interpretable areas of the genome requires higher localized sequencing depth, an augmented exome approach offers both cost and performance advantages over other sequencing-based tests.

    View details for DOI 10.1186/s13073-015-0197-4

    View details for PubMedID 26269718

  • Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis. Nature communications Fotiou, E., Martin-Almedina, S., Simpson, M. A., Lin, S., Gordon, K., Brice, G., Atton, G., Jeffery, I., Rees, D. C., Mignot, C., Vogt, J., Homfray, T., Snyder, M. P., Rockson, S. G., Jeffery, S., Mortimer, P. S., Mansour, S., Ostergaard, P. 2015; 6: 8085-?

    View details for DOI 10.1038/ncomms9085

    View details for PubMedID 26333996

  • Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis. Nature communications Fotiou, E., Martin-Almedina, S., Simpson, M. A., Lin, S., Gordon, K., Brice, G., Atton, G., Jeffery, I., Rees, D. C., Mignot, C., Vogt, J., Homfray, T., Snyder, M. P., Rockson, S. G., Jeffery, S., Mortimer, P. S., Mansour, S., Ostergaard, P. 2015; 6: 8085-?

    Abstract

    Generalized lymphatic dysplasia (GLD) is a rare form of primary lymphoedema characterized by a uniform, widespread lymphoedema affecting all segments of the body, with systemic involvement such as intestinal and/or pulmonary lymphangiectasia, pleural effusions, chylothoraces and/or pericardial effusions. This may present prenatally as non-immune hydrops. Here we report homozygous and compound heterozygous mutations in PIEZO1, resulting in an autosomal recessive form of GLD with a high incidence of non-immune hydrops fetalis and childhood onset of facial and four limb lymphoedema. Mutations in PIEZO1, which encodes a mechanically activated ion channel, have been reported with autosomal dominant dehydrated hereditary stomatocytosis and non-immune hydrops of unknown aetiology. Besides its role in red blood cells, our findings indicate that PIEZO1 is also involved in the development of lymphatic structures.

    View details for DOI 10.1038/ncomms9085

    View details for PubMedID 26333996

  • Whole-Exome Enrichment with the Agilent SureSelect Human All Exon Platform. Cold Spring Harbor protocols Chen, R., Im, H., Snyder, M. 2015; 2015 (7): pdb prot083659-?

    Abstract

    There are multiple platforms available for whole-exome enrichment and sequencing (WES). This protocol is based on the Agilent SureSelect Human All Exon platform, which targets ∼50 Mb of the human exonic regions. The SureSelect system uses ∼120-base RNA probes to capture known coding DNA sequences (CDS) from the NCBI Consensus CDS Database as well as other major RNA coding sequence databases, such as Sanger miRBase. The protocol can be performed at the benchside without the need for automation, and the resulting library can be used for targeted next-generation sequencing on an Illumina HiSeq 2000 sequencer.

    View details for DOI 10.1101/pdb.prot083659

    View details for PubMedID 25762417

  • Metabolome progression during early gut microbial colonization of gnotobiotic mice. Scientific reports Marcobal, A., Yusufaly, T., Higginbottom, S., Snyder, M., Sonnenburg, J. L., Mias, G. I. 2015; 5: 11589-?

    Abstract

    The microbiome has been implicated directly in host health, especially host metabolic processes and development of immune responses. These are particularly important in infants where the gut first begins being colonized, and such processes may be modeled in mice. In this investigation we follow longitudinally the urine metabolome of ex-germ-free mice, which are colonized with two bacterial species, Bacteroides thetaiotaomicron and Bifidobacterium longum. High-throughput mass spectrometry profiling of urine samples revealed dynamic changes in the metabolome makeup, associated with the gut bacterial colonization, enabled by our adaptation of non-linear time-series analysis to urine metabolomics data. Results demonstrate both gradual and punctuated changes in metabolite production and that early colonization events profoundly impact the nature of small molecules circulating in the host. The identified small molecules are implicated in amino acid and carbohydrate metabolic processes, and offer insights into the dynamic changes occurring during the colonization process, using high-throughput longitudinal methodology.

    View details for DOI 10.1038/srep11589

    View details for PubMedID 26118551

    View details for PubMedCentralID PMC4484351

  • Genomic analysis of fibrolamellar hepatocellular carcinoma. Human molecular genetics Xu, L., Hazard, F. K., Zmoos, A., Jahchan, N., Chaib, H., Garfin, P. M., Rangaswami, A., Snyder, M. P., Sage, J. 2015; 24 (1): 50-63

    Abstract

    Pediatric tumors are relatively infrequent but are often associated with significant lethality and lifelong morbidity. A major goal of pediatric cancer research has been to identify key drivers of tumorigenesis to eventually develop targeted therapies to enhance cure rate and minimize acute and long-term toxic effects. Here we used genomics approaches to identify biomarkers and candidate drivers for fibrolamellar hepatocellular carcinoma (FL-HCC), a very rare subtype of pediatric liver cancer for which limited therapeutic options exist. In-depth genomics analyses of one tumor followed by immunohistochemistry validation on seven other tumors showed expression of neuroendocrine markers in FL-HCC. DNA and RNA sequencing data further showed that common cancer pathways are not visibly altered in FL-HCC but identified two novel structural variants, both resulting in fusion transcripts. The first, a 400kb deletion, results in a DNAJ1-PRKCA fusion transcript, which leads to increased PKA activity in the index tumor case and other FL-HCC cases compared to normal liver. This PKA fusion protein is oncogenic in HCC cells. The second gene fusion event, a translocation between the CLPTML1 and GLIS3 genes, generates a transcript whose product also promotes cancer phenotypes in HCC cell lines. These experiments further highlight the tumorigenic role of gene fusions in the etiology of pediatric solid tumors and identify both candidate biomarkers and possible therapeutic targets for this lethal pediatric disease.

    View details for DOI 10.1093/hmg/ddu418

    View details for PubMedID 25122662

  • Exome sequencing and genome-wide copy number variant mapping reveal novel associations with sensorineural hereditary hearing loss BMC GENOMICS Haraksingh, R. R., Jahanbani, F., Rodriguez-Paris, J., Gelernter, J., Nadeau, K. C., Oghalai, J. S., Schrijver, I., Snyder, M. P. 2014; 15

    Abstract

    The genetic diversity of loci and mutations underlying hereditary hearing loss is an active area of investigation. To identify loci associated with predominantly non-syndromic sensorineural hearing loss, we performed exome sequencing of families and of single probands, as well as copy number variation (CNV) mapping in a case-control cohort.Analysis of three distinct families revealed several candidate loci in two families and a single strong candidate gene, MYH7B, for hearing loss in one family. MYH7B encodes a Type II myosin, consistent with a role for cytoskeletal proteins in hearing. High-resolution genome-wide CNV analysis of 150 cases and 157 controls revealed deletions in genes known to be involved in hearing (e.g. GJB6, OTOA, and STRC, encoding connexin 30, otoancorin, and stereocilin, respectively), supporting CNV contributions to hearing loss phenotypes. Additionally, a novel region on chromosome 16 containing part of the PDXDC1 gene was found to be frequently deleted in hearing loss patients (OR = 3.91, 95% CI: 1.62-9.40, p = 1.45 x 10-7).We conclude that many known as well as novel loci and distinct types of mutations not typically tested in clinical settings can contribute to the etiology of hearing loss. Our study also demonstrates the challenges of exome sequencing and genome-wide CNV mapping for direct clinical application, and illustrates the need for functional and clinical follow-up as well as curated open-access databases.

    View details for DOI 10.1186/1471-2164-15-1155

    View details for Web of Science ID 000209598100001

  • Genomic era diagnosis and management of hereditary and sporadic colon cancer. World journal of clinical oncology Esplin, E. D., Snyder, M. P. 2014; 5 (5): 1036-1047

    Abstract

    The morbidity and mortality attributable to heritable and sporadic carcinomas of the colon are substantial and affect children and adults alike. Despite current colonoscopy screening recommendations colorectal adenocarcinoma (CRC) still accounts for almost 140000 cancer cases yearly. Familial adenomatous polyposis (FAP) is a colon cancer predisposition due to alterations in the adenomatous polyposis coli gene, which is mutated in most CRC. Since the beginning of the genomic era next-generation sequencing analyses of CRC continue to improve our understanding of the genetics of tumorigenesis and promise to expand our ability to identify and treat this disease. Advances in genome sequence analysis have facilitated the molecular diagnosis of individuals with FAP, which enables initiation of appropriate monitoring and timely intervention. Genome sequencing also has potential clinical impact for individuals with sporadic forms of CRC, providing means for molecular diagnosis of CRC tumor type, data guiding selection of tumor targeted therapies, and pharmacogenomic profiles specifying patient specific drug tolerances. There is even a potential role for genomic sequencing in surveillance for recurrence, and early detection, of CRC. We review strategies for diagnostic assessment and management of FAP and sporadic CRC in the current genomic era, with emphasis on the current, and potential for future, impact of genome sequencing on the clinical care of these conditions.

    View details for DOI 10.5306/wjco.v5.i5.1036

    View details for PubMedID 25493239

    View details for PubMedCentralID PMC4259930

  • Widespread contribution of transposable elements to the innovation of gene regulatory networks GENOME RESEARCH Sundaram, V., Cheng, Y., Ma, Z., Li, D., Xing, X., Edge, P., Snyder, M. P., Wang, T. 2014; 24 (12): 1963-1976

    Abstract

    Transposable elements (TEs) have been shown to contain functional binding sites for certain transcription factors (TFs). However, the extent to which TEs contribute to the evolution of TF binding sites is not well known. We comprehensively mapped binding sites for 26 pairs of orthologous TFs in two pairs of human and mouse cell lines (representing two cell lineages), along with epigenomic profiles, including DNA methylation and six histone modifications. Overall, we found that 20% of binding sites were embedded within TEs. This number varied across different TFs, ranging from 2% to 40%. We further identified 710 TF-TE relationships in which genomic copies of a TE subfamily contributed a significant number of binding peaks for a TF, and we found that LTR elements dominated these relationships in human. Importantly, TE-derived binding peaks were strongly associated with open and active chromatin signatures, including reduced DNA methylation and increased enhancer-associated histone marks. On average, 66% of TE-derived binding events were cell type-specific with a cell type-specific epigenetic landscape. Most of the binding sites contributed by TEs were species-specific, but we also identified binding sites conserved between human and mouse, the functional relevance of which was supported by a signature of purifying selection on DNA sequences of these TEs. Interestingly, several TFs had significantly expanded binding site landscapes only in one species, which were linked to species-specific gene functions, suggesting that TEs are an important driving force for regulatory innovation. Taken together, our data suggest that TEs have significantly and continuously shaped gene regulatory networks during mammalian evolution.

    View details for DOI 10.1101/gr.168872.113

    View details for Web of Science ID 000345810600005

    View details for PubMedID 25319995

    View details for PubMedCentralID PMC4248313

  • Genome-wide map of regulatory interactions in the human genome GENOME RESEARCH Heidari, N., Phanstiel, D. H., He, C., Grubert, F., Jahanbani, F., Kasowski, M., Zhang, M. Q., Snyder, M. P. 2014; 24 (12): 1905-1917

    Abstract

    Increasing evidence suggests that interactions between regulatory genomic elements play an important role in regulating gene expression. We generated a genome-wide interaction map of regulatory elements in human cells (ENCODE tier 1 cells, K562, GM12878) using Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) experiments targeting six broadly distributed factors. Bound regions covered 80% of DNase I hypersensitive sites including 99.7% of TSS and 98% of enhancers. Correlating this map with ChIP-seq and RNA-seq data sets revealed cohesin, CTCF, and ZNF143 as key components of three-dimensional chromatin structure and revealed how the distal chromatin state affects gene transcription. Comparison of interactions between cell types revealed that enhancer-promoter interactions were highly cell-type-specific. Construction and comparison of distal and proximal regulatory networks revealed stark differences in structure and biological function. Proximal binding events are enriched at genes with housekeeping functions, while distal binding events interact with genes involved in dynamic biological processes including response to stimulus. This study reveals new mechanistic and functional insights into regulatory region organization in the nucleus.

    View details for DOI 10.1101/gr.176586.114

    View details for PubMedID 25228660

  • A comparative encyclopedia of DNA elements in the mouse genome NATURE Yue, F., Cheng, Y., Breschi, A., Vierstra, J., Wu, W., Ryba, T., Sandstrom, R., Ma, Z., Davis, C., Pope, B. D., Shen, Y., Pervouchine, D. D., Djebali, S., Thurman, R. E., Kaul, R., Rynes, E., Kirilusha, A., Marinov, G. K., Williams, B. A., Trout, D., Amrhein, H., Fisher-Aylor, K., Antoshechkin, I., DeSalvo, G., See, L., Fastuca, M., Drenkow, J., Zaleski, C., Dobin, A., Prieto, P., Lagarde, J., Bussotti, G., Tanzer, A., Denas, O., Li, K., Bender, M. A., Zhang, M., Byron, R., Groudine, M. T., McCleary, D., Pham, L., Ye, Z., Kuan, S., Edsall, L., Wu, Y., Rasmussen, M. D., Bansal, M. S., Kellis, M., Keller, C. A., Morrissey, C. S., Mishra, T., Jain, D., Dogan, N., Harris, R. S., Cayting, P., Kawli, T., Boyle, A. P., Euskirchen, G., Kundaje, A., Lin, S., Lin, Y., Jansen, C., Malladi, V. S., Cline, M. S., Erickson, D. T., Kirkup, V. M., Learned, K., Sloan, C. A., Rosenbloom, K. R., De Sousa, B. L., Beal, K., Pignatelli, M., Flicek, P., Lian, J., Kahveci, T., Lee, D., Kent, W. J., Santos, M. R., Herrero, J., Notredame, C., Johnson, A., Vong, S., Lee, K., Bates, D., Neri, F., Diegel, M., Canfield, T., Sabo, P. J., Wilken, M. S., Reh, T. A., Giste, E., Shafer, A., Kutyavin, T., Haugen, E., Dunn, D., Reynolds, A. P., Neph, S., Humbert, R., Hansen, R. S., de Bruijn, M., Selleri, L., Rudensky, A., Josefowicz, S., Samstein, R., Eichler, E. E., Orkin, S. H., Levasseur, D., Papayannopoulou, T., Chang, K., Skoultchi, A., Gosh, S., Disteche, C., Treuting, P., Wang, Y., Weiss, M. J., Blobel, G. A., Cao, X., Zhong, S., Wang, T., Good, P. J., Lowdon, R. F., Adams, L. B., Zhou, X., Pazin, M. J., Feingold, E. A., Wold, B., Taylor, J., Mortazavi, A., Weissman, S. M., Stamatoyannopoulos, J. A., Snyder, M. P., Guigo, R., Gingeras, T. R., Gilbert, D. M., Hardison, R. C., Beer, M. A., Ren, B. 2014; 515 (7527): 355-?

    Abstract

    The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

    View details for DOI 10.1038/nature13992

    View details for Web of Science ID 000345770600034

  • A comparative encyclopedia of DNA elements in the mouse genome. Nature Yue, F., Cheng, Y., Breschi, A., Vierstra, J., Wu, W., Ryba, T., Sandstrom, R., Ma, Z., Davis, C., Pope, B. D., Shen, Y., Pervouchine, D. D., Djebali, S., Thurman, R. E., Kaul, R., Rynes, E., Kirilusha, A., Marinov, G. K., Williams, B. A., Trout, D., Amrhein, H., Fisher-Aylor, K., Antoshechkin, I., DeSalvo, G., See, L., Fastuca, M., Drenkow, J., Zaleski, C., Dobin, A., Prieto, P., Lagarde, J., Bussotti, G., Tanzer, A., Denas, O., Li, K., Bender, M. A., Zhang, M., Byron, R., Groudine, M. T., McCleary, D., Pham, L., Ye, Z., Kuan, S., Edsall, L., Wu, Y., Rasmussen, M. D., Bansal, M. S., Kellis, M., Keller, C. A., Morrissey, C. S., Mishra, T., Jain, D., Dogan, N., Harris, R. S., Cayting, P., Kawli, T., Boyle, A. P., Euskirchen, G., Kundaje, A., Lin, S., Lin, Y., Jansen, C., Malladi, V. S., Cline, M. S., Erickson, D. T., Kirkup, V. M., Learned, K., Sloan, C. A., Rosenbloom, K. R., Lacerda de Sousa, B., Beal, K., Pignatelli, M., Flicek, P., Lian, J., Kahveci, T., Lee, D., Kent, W. J., Ramalho Santos, M., Herrero, J., Notredame, C., Johnson, A., Vong, S., Lee, K., Bates, D., Neri, F., Diegel, M., Canfield, T., Sabo, P. J., Wilken, M. S., Reh, T. A., Giste, E., Shafer, A., Kutyavin, T., Haugen, E., Dunn, D., Reynolds, A. P., Neph, S., Humbert, R., Hansen, R. S., de Bruijn, M., Selleri, L., Rudensky, A., Josefowicz, S., Samstein, R., Eichler, E. E., Orkin, S. H., Levasseur, D., Papayannopoulou, T., Chang, K., Skoultchi, A., Gosh, S., Disteche, C., Treuting, P., Wang, Y., Weiss, M. J., Blobel, G. A., Cao, X., Zhong, S., Wang, T., Good, P. J., Lowdon, R. F., Adams, L. B., Zhou, X., Pazin, M. J., Feingold, E. A., Wold, B., Taylor, J., Mortazavi, A., Weissman, S. M., Stamatoyannopoulos, J. A., Snyder, M. P., Guigo, R., Gingeras, T. R., Gilbert, D. M., Hardison, R. C., Beer, M. A., Ren, B. 2014; 515 (7527): 355-364

    Abstract

    The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

    View details for DOI 10.1038/nature13992

    View details for PubMedID 25409824

  • Topologically associating domains are stable units of replication-timing regulation. Nature Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., Ren, B., Gilbert, D. M. 2014; 515 (7527): 402-405

    Abstract

    Eukaryotic chromosomes replicate in a temporal order known as the replication-timing program. In mammals, replication timing is cell-type-specific with at least half the genome switching replication timing during development, primarily in units of 400-800 kilobases ('replication domains'), whose positions are preserved in different cell types, conserved between species, and appear to confine long-range effects of chromosome rearrangements. Early and late replication correlate, respectively, with open and closed three-dimensional chromatin compartments identified by high-resolution chromosome conformation capture (Hi-C), and, to a lesser extent, late replication correlates with lamina-associated domains (LADs). Recent Hi-C mapping has unveiled substructure within chromatin compartments called topologically associating domains (TADs) that are largely conserved in their positions between cell types and are similar in size to replication domains. However, TADs can be further sub-stratified into smaller domains, challenging the significance of structures at any particular scale. Moreover, attempts to reconcile TADs and LADs to replication-timing data have not revealed a common, underlying domain structure. Here we localize boundaries of replication domains to the early-replicating border of replication-timing transitions and map their positions in 18 human and 13 mouse cell types. We demonstrate that, collectively, replication domain boundaries share a near one-to-one correlation with TAD boundaries, whereas within a cell type, adjacent TADs that replicate at similar times obscure replication domain boundaries, largely accounting for the previously reported lack of alignment. Moreover, cell-type-specific replication timing of TADs partitions the genome into two large-scale sub-nuclear compartments revealing that replication-timing transitions are indistinguishable from late-replicating regions in chromatin composition and lamina association and accounting for the reduced correlation of replication timing to LADs and heterochromatin. Our results reconcile cell-type-specific sub-nuclear compartmentalization and replication timing with developmentally stable structural domains and offer a unified model for large-scale chromosome structure and function.

    View details for DOI 10.1038/nature13986

    View details for PubMedID 25409831

  • Principles of regulatory information conservation between mouse and human NATURE Cheng, Y., Ma, Z., Kim, B., Wu, W., Cayting, P., Boyle, A. P., Sundaram, V., Xing, X., Dogan, N., Li, J., Euskirchen, G., Lin, S., Lin, Y., Visel, A., Kawli, T., Yang, X., Patacsil, D., Keller, C. A., Giardine, B., Kundaje, A., Wang, T., Pennacchio, L. A., Weng, Z., Hardison, R. C., Snyder, M. P. 2014; 515 (7527): 371-?

    Abstract

    To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.

    View details for DOI 10.1038/nature13985

    View details for Web of Science ID 000345770600036

    View details for PubMedCentralID PMC4343047

  • Topologically associating domains are stable units of replication-timing regulation NATURE Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Guelsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., Ren, B., Gilbert, D. M. 2014; 515 (7527): 402-?

    Abstract

    Eukaryotic chromosomes replicate in a temporal order known as the replication-timing program. In mammals, replication timing is cell-type-specific with at least half the genome switching replication timing during development, primarily in units of 400-800 kilobases ('replication domains'), whose positions are preserved in different cell types, conserved between species, and appear to confine long-range effects of chromosome rearrangements. Early and late replication correlate, respectively, with open and closed three-dimensional chromatin compartments identified by high-resolution chromosome conformation capture (Hi-C), and, to a lesser extent, late replication correlates with lamina-associated domains (LADs). Recent Hi-C mapping has unveiled substructure within chromatin compartments called topologically associating domains (TADs) that are largely conserved in their positions between cell types and are similar in size to replication domains. However, TADs can be further sub-stratified into smaller domains, challenging the significance of structures at any particular scale. Moreover, attempts to reconcile TADs and LADs to replication-timing data have not revealed a common, underlying domain structure. Here we localize boundaries of replication domains to the early-replicating border of replication-timing transitions and map their positions in 18 human and 13 mouse cell types. We demonstrate that, collectively, replication domain boundaries share a near one-to-one correlation with TAD boundaries, whereas within a cell type, adjacent TADs that replicate at similar times obscure replication domain boundaries, largely accounting for the previously reported lack of alignment. Moreover, cell-type-specific replication timing of TADs partitions the genome into two large-scale sub-nuclear compartments revealing that replication-timing transitions are indistinguishable from late-replicating regions in chromatin composition and lamina association and accounting for the reduced correlation of replication timing to LADs and heterochromatin. Our results reconcile cell-type-specific sub-nuclear compartmentalization and replication timing with developmentally stable structural domains and offer a unified model for large-scale chromosome structure and function.

    View details for DOI 10.1038/nature13986

    View details for Web of Science ID 000345770600043

    View details for PubMedCentralID PMC4251741

  • Personalized sequencing and the future of medicine: discovery, diagnosis and defeat of disease. Pharmacogenomics Esplin, E. D., Oei, L., Snyder, M. P. 2014; 15 (14): 1771-1790

    Abstract

    The potential for personalized sequencing to individually optimize medical treatment in diseases such as cancer and for pharmacogenomic application is just beginning to be realized, and the utility of sequencing healthy individuals for managing health is also being explored. The data produced requires additional advancements in interpretation of variants of unknown significance to maximize clinical benefit. Nevertheless, personalized sequencing, only recently applied to clinical medicine, has already been broadly applied to the discovery and study of disease. It is poised to enable the earlier and more accurate diagnosis of disease risk and occurrence, guide prevention and individualized intervention as well as facilitate monitoring of healthy and treated patients, and play a role in the prevention and recurrence of future disease. This article documents the advancing capacity of personalized sequencing, reviews its impact on disease-oriented scientific discovery and anticipates its role in the future of medicine.

    View details for DOI 10.2217/pgs.14.117

    View details for PubMedID 25493570

    View details for PubMedCentralID PMC4336568

  • Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway GENETICS IN MEDICINE Enns, G. M., Shashi, V., Bainbridge, M., Gambello, M. J., Zahir, F. R., Bast, T., Crimian, R., Schoch, K., Platt, J., Cox, R., Bernstein, J. A., Scavina, M., Walter, R. S., Bibb, A., Jones, M., Hegde, M., Graham, B. H., Need, A. C., Oviedo, A., Schaaf, C. P., Boyle, S., Butte, A. J., Chen, R., Clark, M. J., Haraksingh, R., Cowan, T. M., He, P., Langlois, S., Zoghbi, H. Y., Snyder, M., Gibbs, R. A., Freeze, H. H., Goldstein, D. B. 2014; 16 (10): 751-758

    Abstract

    Purpose:The endoplasmic reticulum-associated degradation pathway is responsible for the translocation of misfolded proteins across the endoplasmic reticulum membrane into the cytosol for subsequent degradation by the proteasome. To define the phenotype associated with a novel inherited disorder of cytosolic endoplasmic reticulum-associated degradation pathway dysfunction, we studied a series of eight patients with deficiency of N-glycanase 1.Methods:Whole-genome, whole-exome, or standard Sanger sequencing techniques were employed. Retrospective chart reviews were performed in order to obtain clinical data.Results:All patients had global developmental delay, a movement disorder, and hypotonia. Other common findings included hypolacrima or alacrima (7/8), elevated liver transaminases (6/7), microcephaly (6/8), diminished reflexes (6/8), hepatocyte cytoplasmic storage material or vacuolization (5/6), and seizures (4/8). The nonsense mutation c.1201A>T (p.R401X) was the most common deleterious allele.Conclusion:NGLY1 deficiency is a novel autosomal recessive disorder of the endoplasmic reticulum-associated degradation pathway associated with neurological dysfunction, abnormal tear production, and liver disease. The majority of patients detected to date carry a specific nonsense mutation that appears to be associated with severe disease. The phenotypic spectrum is likely to enlarge as cases with a broader range of mutations are detected.Genet Med advance online publication 20 March 2014Genetics in Medicine (2014); doi:10.1038/gim.2014.22.

    View details for DOI 10.1038/gim.2014.22

    View details for Web of Science ID 000342884500005

  • Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway. Genetics in medicine Enns, G. M., Shashi, V., Bainbridge, M., Gambello, M. J., Zahir, F. R., Bast, T., Crimian, R., Schoch, K., Platt, J., Cox, R., Bernstein, J. A., Scavina, M., Walter, R. S., Bibb, A., Jones, M., Hegde, M., Graham, B. H., Need, A. C., Oviedo, A., Schaaf, C. P., Boyle, S., Butte, A. J., Chen, R., Clark, M. J., Haraksingh, R., Cowan, T. M., He, P., Langlois, S., Zoghbi, H. Y., Snyder, M., Gibbs, R. A., Freeze, H. H., Goldstein, D. B. 2014; 16 (10): 751-758

    Abstract

    Purpose:The endoplasmic reticulum-associated degradation pathway is responsible for the translocation of misfolded proteins across the endoplasmic reticulum membrane into the cytosol for subsequent degradation by the proteasome. To define the phenotype associated with a novel inherited disorder of cytosolic endoplasmic reticulum-associated degradation pathway dysfunction, we studied a series of eight patients with deficiency of N-glycanase 1.Methods:Whole-genome, whole-exome, or standard Sanger sequencing techniques were employed. Retrospective chart reviews were performed in order to obtain clinical data.Results:All patients had global developmental delay, a movement disorder, and hypotonia. Other common findings included hypolacrima or alacrima (7/8), elevated liver transaminases (6/7), microcephaly (6/8), diminished reflexes (6/8), hepatocyte cytoplasmic storage material or vacuolization (5/6), and seizures (4/8). The nonsense mutation c.1201A>T (p.R401X) was the most common deleterious allele.Conclusion:NGLY1 deficiency is a novel autosomal recessive disorder of the endoplasmic reticulum-associated degradation pathway associated with neurological dysfunction, abnormal tear production, and liver disease. The majority of patients detected to date carry a specific nonsense mutation that appears to be associated with severe disease. The phenotypic spectrum is likely to enlarge as cases with a broader range of mutations are detected.Genet Med advance online publication 20 March 2014Genetics in Medicine (2014); doi:10.1038/gim.2014.22.

    View details for DOI 10.1038/gim.2014.22

    View details for PubMedID 24651605

  • Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures. Bioinformatics Phanstiel, D. H., Boyle, A. P., Araya, C. L., Snyder, M. P. 2014; 30 (19): 2808-2810

    Abstract

    Motivation: Interpretation and communication of genomic data require flexible and quantitative tools to analyze and visualize diverse data types, and yet, a comprehensive tool to display all common genomic data types in publication quality figures does not exist to date. To address this shortcoming, we present Sushi.R, an R/Bioconductor package that allows flexible integration of genomic visualizations into highly customizable, publication-ready, multi-panel figures from common genomic data formats including Browser Extensible Data (BED), bedGraph and Browser Extensible Data Paired-End (BEDPE). Sushi.R is open source and made publicly available through GitHub (https://github.com/dphansti/Sushi) and Bioconductor (http://bioconductor.org/packages/release/bioc/html/Sushi.html).mpsnyder@stanford.edu or dphansti@stanford.edu.

    View details for DOI 10.1093/bioinformatics/btu379

    View details for PubMedID 24903420

  • Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures BIOINFORMATICS Phanstiel, D. H., Boyle, A. P., Araya, C. L., Snyder, M. P. 2014; 30 (19): 2808-2810

    Abstract

    Motivation: Interpretation and communication of genomic data require flexible and quantitative tools to analyze and visualize diverse data types, and yet, a comprehensive tool to display all common genomic data types in publication quality figures does not exist to date. To address this shortcoming, we present Sushi.R, an R/Bioconductor package that allows flexible integration of genomic visualizations into highly customizable, publication-ready, multi-panel figures from common genomic data formats including Browser Extensible Data (BED), bedGraph and Browser Extensible Data Paired-End (BEDPE). Sushi.R is open source and made publicly available through GitHub (https://github.com/dphansti/Sushi) and Bioconductor (http://bioconductor.org/packages/release/bioc/html/Sushi.html).mpsnyder@stanford.edu or dphansti@stanford.edu.

    View details for DOI 10.1093/bioinformatics/btu379

    View details for Web of Science ID 000343082900016

  • Comparative analysis of regulatory information and circuits across distant species. Nature Boyle, A. P., Araya, C. L., Brdlik, C., Cayting, P., Cheng, C., Cheng, Y., Gardner, K., Hillier, L. W., Janette, J., Jiang, L., Kasper, D., Kawli, T., Kheradpour, P., Kundaje, A., Li, J. J., Ma, L., Niu, W., Rehm, E. J., Rozowsky, J., Slattery, M., Spokony, R., Terrell, R., Vafeados, D., Wang, D., Weisdepp, P., Wu, Y., Xie, D., Yan, K., Feingold, E. A., Good, P. J., Pazin, M. J., Huang, H., Bickel, P. J., Brenner, S. E., Reinke, V., Waterston, R. H., Gerstein, M., White, K. P., Kellis, M., Snyder, M. 2014; 512 (7515): 453-456

    Abstract

    Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.

    View details for DOI 10.1038/nature13668

    View details for PubMedID 25164757

  • Regulatory analysis of the C. elegans genome with spatiotemporal resolution. Nature Araya, C. L., Kawli, T., Kundaje, A., Jiang, L., Wu, B., Vafeados, D., Terrell, R., Weissdepp, P., Gevirtzman, L., Mace, D., Niu, W., Boyle, A. P., Xie, D., Ma, L., Murray, J. I., Reinke, V., Waterston, R. H., Snyder, M. 2014; 512 (7515): 400-405

    Abstract

    Discovering the structure and dynamics of transcriptional regulatory events in the genome with cellular and temporal resolution is crucial to understanding the regulatory underpinnings of development and disease. We determined the genomic distribution of binding sites for 92 transcription factors and regulatory proteins across multiple stages of Caenorhabditis elegans development by performing 241 ChIP-seq (chromatin immunoprecipitation followed by sequencing) experiments. Integration of regulatory binding and cellular-resolution expression data produced a spatiotemporally resolved metazoan transcription factor binding map. Using this map, we explore developmental regulatory circuits that encode combinatorial logic at the levels of co-binding and co-expression of transcription factors, characterizing the genomic coverage and clustering of regulatory binding, the binding preferences of, and biological processes regulated by, transcription factors, the global transcription factor co-associations and genomic subdomains that suggest shared patterns of regulation, and identifying key transcription factors and transcription factor co-associations for fate specification of individual lineages and cell types.

    View details for DOI 10.1038/nature13497

    View details for PubMedID 25164749

  • Shared functions of plant and mammalian StAR-related lipid transfer (START) domains in modulating transcription factor activity BMC BIOLOGY Schrick, K., Bruno, M., Khosla, A., Cox, P. N., Marlatt, S. A., Roque, R. A., Nguyen, H. C., He, C., Snyder, M. P., Singh, D., Yadav, G. 2014; 12

    Abstract

    Steroidogenic acute regulatory protein (StAR)-related lipid transfer (START) domains were first identified from mammalian proteins that bind lipid/sterol ligands via a hydrophobic pocket. In plants, predicted START domains are predominantly found in homeodomain leucine zipper (HD-Zip) transcription factors that are master regulators of cell-type differentiation in development. Here we utilized studies of Arabidopsis in parallel with heterologous expression of START domains in yeast to investigate the hypothesis that START domains are versatile ligand-binding motifs that can modulate transcription factor activity.Our results show that deletion of the START domain from Arabidopsis Glabra2 (GL2), a representative HD-Zip transcription factor involved in differentiation of the epidermis, results in a complete loss-of-function phenotype, although the protein is correctly localized to the nucleus. Despite low sequence similarly, the mammalian START domain from StAR can functionally replace the HD-Zip-derived START domain. Embedding the START domain within a synthetic transcription factor in yeast, we found that several mammalian START domains from StAR, MLN64 and PCTP stimulated transcription factor activity, as did START domains from two Arabidopsis HD-Zip transcription factors. Mutation of ligand-binding residues within StAR START reduced this activity, consistent with the yeast assay monitoring ligand-binding. The D182L missense mutation in StAR START was shown to affect GL2 transcription factor activity in maintenance of the leaf trichome cell fate. Analysis of in vivo protein-metabolite interactions by mass spectrometry provided direct evidence for analogous lipid-binding activity in mammalian and plant START domains in the yeast system. Structural modeling predicted similar sized ligand-binding cavities of a subset of plant START domains in comparison to mammalian counterparts.The START domain is required for transcription factor activity in HD-Zip proteins from plants, although it is not strictly necessary for the protein's nuclear localization. START domains from both mammals and plants are modular in that they can bind lipid ligands to regulate transcription factor function in a yeast system. The data provide evidence for an evolutionarily conserved mechanism by which lipid metabolites can orchestrate transcription. We propose a model in which the START domain is used by both plants and mammals to regulate transcription factor activity.

    View details for DOI 10.1186/s12915-014-0070-8

    View details for Web of Science ID 000342371100001

    View details for PubMedID 25159688

    View details for PubMedCentralID PMC4169639

  • Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human biology and disease PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kellis, M., Wold, B., Snyder, M. P., Bernstein, B. E., Kundaje, A., Marinov, G. K., Ward, L. D., Birney, E., Crawford, G. E., Dekker, J., Dunham, I., Elnitski, L. L., Farnham, P. J., Feingold, E. A., Gerstein, M., Giddings, M. C., Gilbert, D. M., Gingeras, T. R., Green, E. D., Guigo, R., Hubbard, T., Kent, J., Lieb, J. D., Myers, R. M., Pazin, M. J., Ren, B., Stamatoyannopoulos, J., Weng, Z., White, K. P., Hardison, R. C. 2014; 111 (33): E3366-E3366

    View details for DOI 10.1073/pnas.1410434111

    View details for Web of Science ID 000340438800004

    View details for PubMedID 25275169

    View details for PubMedCentralID PMC4143047

  • Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS genetics Martin, A. R., Costa, H. A., Lappalainen, T., Henn, B. M., Kidd, J. M., Yee, M., Grubert, F., Cann, H. M., Snyder, M., Montgomery, S. B., Bustamante, C. D. 2014; 10 (8)

    Abstract

    Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP). The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and regulatory genetics across populations from the broadest points of human migration history yet sampled.

    View details for DOI 10.1371/journal.pgen.1004549

    View details for PubMedID 25121757

  • Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS genetics Martin, A. R., Costa, H. A., Lappalainen, T., Henn, B. M., Kidd, J. M., Yee, M., Grubert, F., Cann, H. M., Snyder, M., Montgomery, S. B., Bustamante, C. D. 2014; 10 (8)

    View details for DOI 10.1371/journal.pgen.1004549

    View details for PubMedID 25121757

  • H3K4me3 Breadth Is Linked to Cell Identity and Transcriptional Consistency. Cell Benayoun, B. A., Pollina, E. A., Ucar, D., Mahmoudi, S., Karra, K., Wong, E. D., Devarajan, K., Daugherty, A. C., Kundaje, A. B., Mancini, E., Hitz, B. C., Gupta, R., Rando, T. A., Baker, J. C., Snyder, M. P., Cherry, J. M., Brunet, A. 2014; 158 (3): 673-688

    Abstract

    Trimethylation of histone H3 at lysine 4 (H3K4me3) is a chromatin modification known to mark the transcription start sites of active genes. Here, we show that H3K4me3 domains that spread more broadly over genes in a given cell type preferentially mark genes that are essential for the identity and function of that cell type. Using the broadest H3K4me3 domains as a discovery tool in neural progenitor cells, we identify novel regulators of these cells. Machine learning models reveal that the broadest H3K4me3 domains represent a distinct entity, characterized by increased marks of elongation. The broadest H3K4me3 domains also have more paused polymerase at their promoters, suggesting a unique transcriptional output. Indeed, genes marked by the broadest H3K4me3 domains exhibit enhanced transcriptional consistency and [corrected] increased transcriptional levels, and perturbation of H3K4me3 breadth leads to changes in transcriptional consistency. Thus, H3K4me3 breadth contains information that could ensure transcriptional precision at key cell identity/function genes.

    View details for DOI 10.1016/j.cell.2014.06.027

    View details for PubMedID 25083876

  • Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proceedings of the National Academy of Sciences of the United States of America Tilgner, H., Grubert, F., Sharon, D., Snyder, M. P. 2014; 111 (27): 9869-9874

    Abstract

    Personal transcriptomes in which all of an individual's genetic variants (e.g., single nucleotide variants) and transcript isoforms (transcription start sites, splice sites, and polyA sites) are defined and quantified for full-length transcripts are expected to be important for understanding individual biology and disease, but have not been described previously. To obtain such transcriptomes, we sequenced the lymphoblastoid transcriptomes of three family members (GM12878 and the parents GM12891 and GM12892) by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and made the following observations. First, we found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes ≤3 kb and often for genes longer than that. Second, we added and quantified previously unidentified splicing isoforms to an existing annotation, thus creating the first personalized annotation to our knowledge. Third, we determined SNVs in a de novo manner and connected them to RNA haplotypes, including HLA haplotypes, thereby assigning single full-length RNA molecules to their transcribed allele, and demonstrated Mendelian inheritance of RNA molecules. Fourth, we show how RNA molecules can be linked to personal variants on a one-by-one basis, which allows us to assess differential allelic expression (DAE) and differential allelic isoforms (DAI) from the phased full-length isoform reads. The DAI method is largely independent of the distance between exon and SNV-in contrast to fragmentation-based methods. Overall, in addition to improving eukaryotic transcriptome annotation, these results describe, to our knowledge, the first large-scale and full-length personal transcriptome.

    View details for DOI 10.1073/pnas.1400447111

    View details for PubMedID 24961374

  • Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway (vol 111, pg 236, 2014) GENETICS IN MEDICINE Enns, G. M., Shashi, V., Bainbridge, M., Gambello, M. J., Zahir, F. R., Bast, T., Crimian, R., Schoch, K., Platt, J., Cox, R., Bernstein, J. A., Scavina, M., Walter, R. S., Bibb, A., Jones, M., Hegde, M., Graham, B. H., Need, A. C., Oviedo, A., Schaaf, C. P., Boyle, S., Butte, A. J., Chen, R., Chen, R., Clark, M. J., Haraksingh, R., Cowan, T. M., He, P., Langlois, S., Zoghbi, H. Y., Snyder, M., Gibbs, R. A., Freeze, H. H., Goldstein, D. B., Chen, R., FORGE Canada Consortium 2014; 16 (7): 568
  • Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nature biotechnology Buenrostro, J. D., Araya, C. L., Chircus, L. M., Layton, C. J., Chang, H. Y., Snyder, M. P., Greenleaf, W. J. 2014; 32 (6): 562-568

    Abstract

    RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >10(7) RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA. Studying the MS2 coat protein, we decompose the binding energy contributions from primary and secondary RNA structure, and observe that differences in affinity are often driven by sequence-specific changes in both association and dissociation rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.

    View details for DOI 10.1038/nbt.2880

    View details for PubMedID 24727714

  • Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nature biotechnology Buenrostro, J. D., Araya, C. L., Chircus, L. M., Layton, C. J., Chang, H. Y., Snyder, M. P., Greenleaf, W. J. 2014; 32 (6): 562-568

    Abstract

    RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >10(7) RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA. Studying the MS2 coat protein, we decompose the binding energy contributions from primary and secondary RNA structure, and observe that differences in affinity are often driven by sequence-specific changes in both association and dissociation rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.

    View details for DOI 10.1038/nbt.2880

    View details for PubMedID 24727714

  • Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature Treutlein, B., Brownfield, D. G., Wu, A. R., Neff, N. F., Mantalas, G. L., Espinoza, F. H., Desai, T. J., Krasnow, M. A., Quake, S. R. 2014; 509 (7500): 371-375

    View details for DOI 10.1038/nature13173

    View details for PubMedID 24739965

  • Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues PLOS GENETICS Kukurba, K. R., Zhang, R., Li, X., Smith, K. S., Knowles, D. A., Tan, M. H., Piskol, R., Lek, M., Snyder, M., MacArthur, D. G., Li, J. B., Montgomery, S. B. 2014; 10 (5)

    Abstract

    Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants.

    View details for DOI 10.1371/journal.pgen.1004304

    View details for Web of Science ID 000337145100010

  • Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues. PLoS genetics Kukurba, K. R., Zhang, R., Li, X., Smith, K. S., Knowles, D. A., How Tan, M., Piskol, R., Lek, M., Snyder, M., MacArthur, D. G., Li, J. B., Montgomery, S. B. 2014; 10 (5)

    Abstract

    Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants.

    View details for DOI 10.1371/journal.pgen.1004304

    View details for PubMedID 24786518

  • Defining functional DNA elements in the human genome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kellis, M., Wold, B., Snyder, M. P., Bernstein, B. E., Kundaje, A., Marinov, G. K., Ward, L. D., Birney, E., Crawford, G. E., Dekker, J., Dunham, I., Elnitski, L. L., Farnham, P. J., Feingold, E. A., Gerstein, M., Giddings, M. C., Gilbert, D. M., Gingeras, T. R., Green, E. D., Guigo, R., Hubbard, T., Kent, J., Lieb, J. D., Myers, R. M., Pazin, M. J., Ren, B., Stamatoyannopoulos, J. A., Weng, Z., White, K. P., Hardison, R. C. 2014; 111 (17): 6131-6138

    Abstract

    With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.

    View details for DOI 10.1073/pnas.1318948111

    View details for Web of Science ID 000335199000025

    View details for PubMedID 24753594

    View details for PubMedCentralID PMC4035993

  • Extended lifespan and reduced adiposity in mice lacking the FAT10 gene. Proceedings of the National Academy of Sciences of the United States of America Canaan, A., DeFuria, J., Perelman, E., Schultz, V., Seay, M., Tuck, D., Flavell, R. A., Snyder, M. P., Obin, M. S., Weissman, S. M. 2014; 111 (14): 5313-5318

    Abstract

    The HLA-F adjacent transcript 10 (FAT10) is a member of the ubiquitin-like gene family that alters protein function/stability through covalent ligation. Although FAT10 is induced by inflammatory mediators and implicated in immunity, the physiological functions of FAT10 are poorly defined. We report the discovery that FAT10 regulates lifespan through pleiotropic actions on metabolism and inflammation. Median and overall lifespan are increased 20% in FAT10ko mice, coincident with elevated metabolic rate, preferential use of fat as fuel, and dramatically reduced adiposity. This phenotype is associated with metabolic reprogramming of skeletal muscle (i.e., increased AMP kinase activity, β-oxidation and -uncoupling, and decreased triglyceride content). Moreover, knockout mice have reduced circulating glucose and insulin levels and enhanced insulin sensitivity in metabolic tissues, consistent with elevated IL-10 in skeletal muscle and serum. These observations suggest novel roles of FAT10 in immune metabolic regulation that impact aging and chronic disease.

    View details for DOI 10.1073/pnas.1323426111

    View details for PubMedID 24706839

    View details for PubMedCentralID PMC3986194

  • Haplotype structure and positive selection at TLR1 EUROPEAN JOURNAL OF HUMAN GENETICS Heffelfinger, C., Pakstis, A. J., Speed, W. C., Clark, A. P., Haigh, E., Fang, R., Furtado, M. R., Kidd, K. K., Snyder, M. P. 2014; 22 (4): 551-557

    Abstract

    Toll-like receptor 1, when dimerized with Toll-like receptor 2, is a cell surface receptor that, upon recognition of bacterial lipoproteins, activates the innate immune system. Variants in TLR1 associate with the risk of a variety of medical conditions and diseases, including sepsis, leprosy, tuberculosis, and others. The foremost of these is rs5743618 c.2079T>G(p.(Ile602Ser)), the derived allele of which is associated with reduced risk of sepsis, leprosy, and other diseases. Interestingly, 602Ser, which shows signatures of selection, inhibits TLR1 surface trafficking and subsequent activation of NFκB upon recognition of a ligand. This suggests that reduced TLR1 activity may be beneficial for human health. To better understand TLR1 variation and its link to human health, we have typed all 7 high-frequency missense variants (>5% in at least one population) along with 17 other variants in and around TLR1 in 2548 individuals from 56 populations from around the globe. We have also found additional signatures of selection on missense variants not associated with rs5743618, suggesting that there may be multiple functional alleles under positive selection in this gene.

    View details for DOI 10.1038/ejhg.2013.194

    View details for Web of Science ID 000332938400027

    View details for PubMedID 24002163

    View details for PubMedCentralID PMC3953919

  • Clinical interpretation and implications of whole-genome sequencing. JAMA Dewey, F. E., Grove, M. E., Pan, C., Goldstein, B. A., Bernstein, J. A., Chaib, H., Merker, J. D., Goldfeder, R. L., Enns, G. M., David, S. P., Pakdaman, N., Ormond, K. E., Caleshu, C., Kingham, K., Klein, T. E., Whirl-Carrillo, M., Sakamoto, K., Wheeler, M. T., Butte, A. J., Ford, J. M., Boxer, L., Ioannidis, J. P., Yeung, A. C., Altman, R. B., Assimes, T. L., Snyder, M., Ashley, E. A., Quertermous, T. 2014; 311 (10): 1035-1045

    Abstract

    Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication.To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings.An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings.Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up.Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P < 001).In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine.

    View details for DOI 10.1001/jama.2014.1717

    View details for PubMedID 24618965

  • Erratum: A single-molecule long-read survey of the human transcriptome. Nature biotechnology Sharon, D., Tilgner, H., Grubert, F., Snyder, M. 2014; 32 (3): 291-?

    View details for DOI 10.1038/nbt0314-291b

    View details for PubMedID 24727780

  • Gene-centric Meta-analysis in 87,736 Individuals of European Ancestry Identifies Multiple Blood-Pressure-Related Loci. American journal of human genetics Tragante, V., Barnes, M. R., Ganesh, S. K., Lanktree, M. B., Guo, W., Franceschini, N., Smith, E. N., Johnson, T., Holmes, M. V., Padmanabhan, S., Karczewski, K. J., Almoguera, B., Barnard, J., Baumert, J., Chang, Y. C., Elbers, C. C., Farrall, M., Fischer, M. E., Gaunt, T. R., Gho, J. M., Gieger, C., Goel, A., Gong, Y., Isaacs, A., Kleber, M. E., Mateo Leach, I., McDonough, C. W., Meijs, M. F., Melander, O., Nelson, C. P., Nolte, I. M., Pankratz, N., Price, T. S., Shaffer, J., Shah, S., Tomaszewski, M., van der Most, P. J., van Iperen, E. P., Vonk, J. M., Witkowska, K., Wong, C. O., Zhang, L., Beitelshees, A. L., Berenson, G. S., Bhatt, D. L., Brown, M., Burt, A., Cooper-DeHoff, R. M., Connell, J. M., Cruickshanks, K. J., Curtis, S. P., Davey-Smith, G., Delles, C., Gansevoort, R. T., Guo, X., Haiqing, S., Hastie, C. E., Hofker, M. H., Hovingh, G. K., Kim, D. S., Kirkland, S. A., Klein, B. E., Klein, R., Li, Y. R., Maiwald, S., Newton-Cheh, C., O'Brien, E. T., Onland-Moret, N. C., Palmas, W., Parsa, A., Penninx, B. W., Pettinger, M., Vasan, R. S., Ranchalis, J. E., M Ridker, P., Rose, L. M., Sever, P., Shimbo, D., Steele, L., Stolk, R. P., Thorand, B., Trip, M. D., van Duijn, C. M., Verschuren, W. M., Wijmenga, C., Wyatt, S., Young, J. H., Zwinderman, A. H., Bezzina, C. R., Boerwinkle, E., Casas, J. P., Caulfield, M. J., Chakravarti, A., Chasman, D. I., Davidson, K. W., Doevendans, P. A., Dominiczak, A. F., FitzGerald, G. A., Gums, J. G., Fornage, M., Hakonarson, H., Halder, I., Hillege, H. L., Illig, T., Jarvik, G. P., Johnson, J. A., Kastelein, J. J., Koenig, W., Kumari, M., März, W., Murray, S. S., O'Connell, J. R., Oldehinkel, A. J., Pankow, J. S., Rader, D. J., Redline, S., Reilly, M. P., Schadt, E. E., Kottke-Marchant, K., Snieder, H., Snyder, M., Stanton, A. V., Tobin, M. D., Uitterlinden, A. G., van der Harst, P., van der Schouw, Y. T., Samani, N. J., Watkins, H., Johnson, A. D., Reiner, A. P., Zhu, X., de Bakker, P. I., Levy, D., Asselbergs, F. W., Munroe, P. B., Keating, B. J. 2014; 94 (3): 349-360

    Abstract

    Blood pressure (BP) is a heritable risk factor for cardiovascular disease. To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP), and pulse pressure (PP), we genotyped ~50,000 SNPs in up to 87,736 individuals of European ancestry and combined these in a meta-analysis. We replicated findings in an independent set of 68,368 individuals of European ancestry. Our analyses identified 11 previously undescribed associations in independent loci containing 31 genes including PDE1A, HLA-DQB1, CDK6, PRKAG2, VCL, H19, NUCB2, RELA, HOXC@ complex, FBN1, and NFAT5 at the Bonferroni-corrected array-wide significance threshold (p < 6 × 10(-7)) and confirmed 27 previously reported associations. Bioinformatic analysis of the 11 loci provided support for a putative role in hypertension of several genes, such as CDK6 and NUCB2. Analysis of potential pharmacological targets in databases of small molecules showed that ten of the genes are predicted to be a target for small molecules. In summary, we identified previously unknown loci associated with BP. Our findings extend our understanding of genes involved in BP regulation, which may provide new targets for therapeutic intervention or drug response stratification.

    View details for DOI 10.1016/j.ajhg.2013.12.016

    View details for PubMedID 24560520

  • Whole-genome haplotyping using long reads and statistical methods NATURE BIOTECHNOLOGY Kuleshov, V., Xie, D., Chen, R., Pushkarev, D., Ma, Z., Blauwkamp, T., Kertesz, M., Snyder, M. 2014; 32 (3): 261-266

    Abstract

    The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing. For a human sample, as little as 30 Gbp of additional sequencing data are needed to phase genotypes identified by 50× coverage whole-genome sequencing. Using SLRH, we phase 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2-1 Mbp in length. We apply our method to determine allele-specific methylation patterns in a human genome and identify hundreds of differentially methylated regions that were previously unknown. SLRH should facilitate population-scale haplotyping of human genomes.

    View details for DOI 10.1038/nbt.2833

    View details for Web of Science ID 000332819800024

    View details for PubMedID 24561555

  • Ordering and dynamical properties of superbright C-60 molecules on Ag(111) PHYSICAL REVIEW B Li, H. I., Abreu, G. J., Shukla, A. K., Fournee, V., Ledieu, J., Loli, L. N., Rauterkus, S. E., Snyder, M. V., Su, S. Y., Marino, K. E., Diehl, R. D. 2014; 89 (8)
  • Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association. PLoS genetics Karczewski, K. J., Snyder, M., Altman, R. B., Tatonetti, N. P. 2014; 10 (2)

    Abstract

    Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease.

    View details for DOI 10.1371/journal.pgen.1004122

    View details for PubMedID 24516403

    View details for PubMedCentralID PMC3916285

  • Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association. PLoS genetics Karczewski, K. J., Snyder, M., Altman, R. B., Tatonetti, N. P. 2014; 10 (2)

    View details for DOI 10.1371/journal.pgen.1004122

    View details for PubMedID 24516403

  • Landscape and variation of RNA secondary structure across the human transcriptome. Nature Wan, Y., Qu, K., Zhang, Q. C., Flynn, R. A., Manor, O., Ouyang, Z., Zhang, J., Spitale, R. C., Snyder, M. P., Segal, E., Chang, H. Y. 2014; 505 (7485): 706-709

    Abstract

    In parallel to the genetic code for protein synthesis, a second layer of information is embedded in all RNA transcripts in the form of RNA structure. RNA structure influences practically every step in the gene expression program. However, the nature of most RNA structures or effects of sequence variation on structure are not known. Here we report the initial landscape and variation of RNA secondary structures (RSSs) in a human family trio (mother, father and their child). This provides a comprehensive RSS map of human coding and non-coding RNAs. We identify unique RSS signatures that demarcate open reading frames and splicing junctions, and define authentic microRNA-binding sites. Comparison of native deproteinized RNA isolated from cells versus refolded purified RNA suggests that the majority of the RSS information is encoded within RNA sequence. Over 1,900 transcribed single nucleotide variants (approximately 15% of all transcribed single nucleotide variants) alter local RNA structure. We discover simple sequence and spacing rules that determine the ability of point mutations to impact RSSs. Selective depletion of 'riboSNitches' versus structurally synonymous variants at precise locations suggests selection for specific RNA shapes at thousands of sites, including 3' untranslated regions, binding sites of microRNAs and RNA-binding proteins genome-wide. These results highlight the potentially broad contribution of RNA structure and its variation to gene regulation.

    View details for DOI 10.1038/nature12946

    View details for PubMedID 24476892

  • Landscape and variation of RNA secondary structure across the human transcriptome. Nature Wan, Y., Qu, K., Zhang, Q. C., Flynn, R. A., Manor, O., Ouyang, Z., Zhang, J., Spitale, R. C., Snyder, M. P., Segal, E., Chang, H. Y. 2014; 505 (7485): 706-709

    Abstract

    In parallel to the genetic code for protein synthesis, a second layer of information is embedded in all RNA transcripts in the form of RNA structure. RNA structure influences practically every step in the gene expression program. However, the nature of most RNA structures or effects of sequence variation on structure are not known. Here we report the initial landscape and variation of RNA secondary structures (RSSs) in a human family trio (mother, father and their child). This provides a comprehensive RSS map of human coding and non-coding RNAs. We identify unique RSS signatures that demarcate open reading frames and splicing junctions, and define authentic microRNA-binding sites. Comparison of native deproteinized RNA isolated from cells versus refolded purified RNA suggests that the majority of the RSS information is encoded within RNA sequence. Over 1,900 transcribed single nucleotide variants (approximately 15% of all transcribed single nucleotide variants) alter local RNA structure. We discover simple sequence and spacing rules that determine the ability of point mutations to impact RSSs. Selective depletion of 'riboSNitches' versus structurally synonymous variants at precise locations suggests selection for specific RNA shapes at thousands of sites, including 3' untranslated regions, binding sites of microRNAs and RNA-binding proteins genome-wide. These results highlight the potentially broad contribution of RNA structure and its variation to gene regulation.

    View details for DOI 10.1038/nature12946

    View details for PubMedID 24476892

  • iPOP and its role in participatory medicine GENOME MEDICINE Snyder, M. 2014; 6

    Abstract

    Michael Snyder shares his thoughts on participatory medicine and how omics profiling could fit into this new model of healthcare where patients are at the center of medicine.

    View details for DOI 10.1186/gm512

    View details for Web of Science ID 000335597000001

    View details for PubMedID 24479626

    View details for PubMedCentralID PMC3978943

  • Identification of STAT5A and STAT5B Target Genes in Human T Cells. PloS one Kanai, T., Seki, S., Jenks, J. A., Kohli, A., Kawli, T., Martin, D. P., Snyder, M., Bacchetta, R., Nadeau, K. C. 2014; 9 (1)

    View details for DOI 10.1371/journal.pone.0086790

    View details for PubMedID 24497979

  • Path-scan: a reporting tool for identifying clinically actionable variants. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Daneshjou, R., Zappala, Z., Kukurba, K., Boyle, S. M., Ormond, K. E., Klein, T. E., Snyder, M., Bustamante, C. D., Altman, R. B., Montgomery, S. B. 2014; 19: 229-240

    Abstract

    The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.

    View details for PubMedID 24297550

  • Toward More Transparent and Reproducible Omics Studies Through a Common Metadata Checklist and Data Publications OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY Kolker, E., Ozdemir, V., Martens, L., Hancock, W., Anderson, G., Anderson, N., Aynacioglu, S., Baranova, A., Campagna, S. R., Chen, R., Choiniere, J., Dearth, S. P., Feng, W., Ferguson, L., Fox, G., Frishman, D., Grossman, R., Heath, A., Higdon, R., Hutz, M. H., Janko, I., Jiang, L., Joshi, S., Kel, A., Kemnitz, J. W., Kohane, I. S., Kolker, N., Lancet, D., Lee, E., Li, W., Lisitsa, A., Llerena, A., Macnealy-Koch, C., Marshall, J., Masuzzo, P., May, A., Mias, G., Monroe, M., Montague, E., Mooney, S., Nesvizhskii, A., Noronha, S., Omenn, G., Rajasimha, H., Ramamoorthy, P., Sheehan, J., Smarr, L., Smith, C. V., Smith, T., Snyder, M., Rapole, S., Srivastava, S., Stanberry, L., Stewart, E., Toppo, S., Uetz, P., Verheggen, K., Voy, B. H., Warnich, L., Wilhelm, S. W., Yandl, G. 2014; 18 (1): 10-14

    Abstract

    Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.

    View details for DOI 10.1089/omi.2013.0149

    View details for Web of Science ID 000331085100002

    View details for PubMedID 24456465

    View details for PubMedCentralID PMC3903324

  • Metadata Checklist for the Integrated Personal OMICS Study: Proteomics and Metabolomics Experiments OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY Snyder, M., Mias, G., Stanberry, L., Kolker, E. 2014; 18 (1): 81–85

    View details for PubMedID 24456466

    View details for PubMedCentralID PMC3903326

  • Identification of STAT5A and STAT5B target genes in human T cells. PloS one Kanai, T., Seki, S., Jenks, J. A., Kohli, A., Kawli, T., Martin, D. P., Snyder, M., Bacchetta, R., Nadeau, K. C. 2014; 9 (1)

    Abstract

    Signal transducer and activator of transcription (STAT) comprises a family of universal transcription factors that help cells sense and respond to environmental signals. STAT5 refers to two highly related proteins, STAT5A and STAT5B, with critical function: their complete deficiency is lethal in mice; in humans, STAT5B deficiency alone leads to endocrine and immunological problems, while STAT5A deficiency has not been reported. STAT5A and STAT5B show peptide sequence similarities greater than 90%, but subtle structural differences suggest possible non-redundant roles in gene regulation. However, these roles remain unclear in humans. We applied chromatin immunoprecipitation followed by DNA sequencing using human CD4(+) T cells to detect candidate genes regulated by STAT5A and/or STAT5B, and quantitative-PCR in STAT5A or STAT5B knock-down (KD) human CD4(+) T cells to validate the findings. Our data show STAT5A and STAT5B play redundant roles in cell proliferation and apoptosis via SGK1 interaction. Interestingly, we found a novel, unique role for STAT5A in binding to genes involved in neural development and function (NDRG1, DNAJC6, and SSH2), while STAT5B appears to play a distinct role in T cell development and function via DOCK8, SNX9, FOXP3 and IL2RA binding. Our results also suggest that one or more co-activators for STAT5A and/or STAT5B may play important roles in establishing different binding abilities and gene regulation behaviors. The new identification of these genes regulated by STAT5A and/or STAT5B has major implications for understanding the pathophysiology of cancer progression, neural disorders, and immune abnormalities.

    View details for DOI 10.1371/journal.pone.0086790

    View details for PubMedID 24497979

    View details for PubMedCentralID PMC3907443

  • Exome sequencing and genome-wide copy number variant mapping reveal novel associations with sensorineural hereditary hearing loss. BMC genomics Haraksingh, R. R., Jahanbani, F., Rodriguez-Paris, J., Gelernter, J., Nadeau, K. C., Oghalai, J. S., Schrijver, I., Snyder, M. P. 2014; 15: 1155-?

    Abstract

    The genetic diversity of loci and mutations underlying hereditary hearing loss is an active area of investigation. To identify loci associated with predominantly non-syndromic sensorineural hearing loss, we performed exome sequencing of families and of single probands, as well as copy number variation (CNV) mapping in a case-control cohort.Analysis of three distinct families revealed several candidate loci in two families and a single strong candidate gene, MYH7B, for hearing loss in one family. MYH7B encodes a Type II myosin, consistent with a role for cytoskeletal proteins in hearing. High-resolution genome-wide CNV analysis of 150 cases and 157 controls revealed deletions in genes known to be involved in hearing (e.g. GJB6, OTOA, and STRC, encoding connexin 30, otoancorin, and stereocilin, respectively), supporting CNV contributions to hearing loss phenotypes. Additionally, a novel region on chromosome 16 containing part of the PDXDC1 gene was found to be frequently deleted in hearing loss patients (OR = 3.91, 95% CI: 1.62-9.40, p = 1.45 x 10-7).We conclude that many known as well as novel loci and distinct types of mutations not typically tested in clinical settings can contribute to the etiology of hearing loss. Our study also demonstrates the challenges of exome sequencing and genome-wide CNV mapping for direct clinical application, and illustrates the need for functional and clinical follow-up as well as curated open-access databases.

    View details for DOI 10.1186/1471-2164-15-1155

    View details for PubMedID 25528277

  • Global analysis of transcription factor-binding sites in yeast using ChIP-Seq. Methods in molecular biology (Clifton, N.J.) Lefrançois, P., Gallagher, J. E., Snyder, M. 2014; 1205: 231-255

    Abstract

    Transcription factors influence gene expression through their ability to bind DNA at specific regulatory elements. Specific DNA-protein interactions can be isolated through the chromatin immunoprecipitation (ChIP) procedure, in which DNA fragments bound by the protein of interest are recovered. ChIP is followed by high-throughput DNA sequencing (Seq) to determine the genomic provenance of ChIP DNA fragments and their relative abundance in the sample. This chapter describes a ChIP-Seq strategy adapted for budding yeast to enable the genome-wide characterization of binding sites of transcription factors (TFs) and other DNA-binding proteins in an efficient and cost-effective way.Yeast strains with epitope-tagged TFs are most commonly used for ChIP-Seq, along with their matching untagged control strains. The initial step of ChIP involves the cross-linking of DNA and proteins. Next, yeast cells are lysed and sonicated to shear chromatin into smaller fragments. An antibody against an epitope-tagged TF is used to pull down chromatin complexes containing DNA and the TF of interest. DNA is then purified and proteins degraded. Specific barcoded adapters for multiplex DNA sequencing are ligated to ChIP DNA. Short DNA sequence reads (28-36 base pairs) are parsed according to the barcode and aligned against the yeast reference genome, thus generating a nucleotide-resolution map of transcription factor-binding sites and their occupancy.

    View details for DOI 10.1007/978-1-4939-1363-3_15

    View details for PubMedID 25213249

  • Strain Kaplan of Pseudorabies Virus Genome Sequenced by PacBio Single-Molecule Real-Time Sequencing Technology. Genome announcements Tombácz, D., Sharon, D., Oláh, P., Csabai, Z., Snyder, M., Boldogkoi, Z. 2014; 2 (4)

    Abstract

    Pseudorabies virus (PRV) is a neurotropic herpesvirus that causes Aujeszky's disease in pigs. PRV strains are widely used as transsynaptic tracers for mapping neural circuits. We present here the complete and fully annotated genome sequence of strain Kaplan of PRV, determined by Pacific Biosciences RSII long-read sequencing technology.

    View details for DOI 10.1128/genomeA.00628-14

    View details for PubMedID 25035325

    View details for PubMedCentralID PMC4102862

  • Serum profiling using protein microarrays to identify disease related antigens. Methods in molecular biology (Clifton, N.J.) Sharon, D., Snyder, M. 2014; 1176: 169-178

    Abstract

    Disease related antigens are of great importance in the clinic. They are used as markers to screen patients for various forms of cancer, to monitor response to therapy, or to serve as therapeutic targets (Chapman et al., Ann Oncol 18(5):868-873, 2007; Soussi et al., Cancer Res 60:1777-1788, 2000; Anderson and LaBaer, J Proteome Res 4:1123-1133, 2005; Levenson, Biochim Biophy Acta 1770:847-856, 2007). In cancer endogenous levels of protein expression may be disrupted or proteins may be expressed in an aberrant fashion resulting in an immune response that bypasses self tolerance (Soussi et al., Cancer Res 60:1777-1788, 2000; Disis et al., J Clin Oncol 15(11):3363-3367, 1997; Molina et al., Breast Cancer Res Treat 51:109-119, 1998). Protein microarrays, which represent a large fraction of the human proteome, have been used to identify antigens in multiple diseases including cancer (Anderson and LaBaer, J Proteome Res 4:1123-1133, 2005; Disis et al., J Clin Oncol 15(11):3363-3367, 1997; Hudson et al., Proc Natl Acad Sci U S A 104(44):17494-17499, 2007; Beyer et al., J Neuroimmunol 242:26-32, 2012). Typically, arrays are probed with immunoglobulin (Ig) samples from patients as well as healthy controls, then compared to determine which antigens (Ag's) are more reactive within the patient group (Hudson et al., Proc Natl Acad Sci U S A 104(44):17494-17499).

    View details for DOI 10.1007/978-1-4939-0992-6_14

    View details for PubMedID 25030927

    View details for PubMedCentralID PMC4420618

  • Personalized sequencing and the future of medicine: discovery, diagnosis and defeat of disease PHARMACOGENOMICS Esplin, E. D., Oei, L., Snyder, M. P. 2014; 15 (14): 1771-1790

    Abstract

    The potential for personalized sequencing to individually optimize medical treatment in diseases such as cancer and for pharmacogenomic application is just beginning to be realized, and the utility of sequencing healthy individuals for managing health is also being explored. The data produced requires additional advancements in interpretation of variants of unknown significance to maximize clinical benefit. Nevertheless, personalized sequencing, only recently applied to clinical medicine, has already been broadly applied to the discovery and study of disease. It is poised to enable the earlier and more accurate diagnosis of disease risk and occurrence, guide prevention and individualized intervention as well as facilitate monitoring of healthy and treated patients, and play a role in the prevention and recurrence of future disease. This article documents the advancing capacity of personalized sequencing, reviews its impact on disease-oriented scientific discovery and anticipates its role in the future of medicine.

    View details for DOI 10.2217/pgs.14.117

    View details for Web of Science ID 000346180100006

    View details for PubMedCentralID PMC4336568

  • STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud. PloS one Karczewski, K. J., Fernald, G. H., Martin, A. R., Snyder, M., Tatonetti, N. P., Dudley, J. T. 2014; 9 (1)

    Abstract

    The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.

    View details for DOI 10.1371/journal.pone.0084860

    View details for PubMedID 24454756

  • Distinct Splice Variants and Pathway Enrichment in the Cell-Line Models of Aggressive Human Breast Cancer Subtypes JOURNAL OF PROTEOME RESEARCH Menon, R., Im, H., Zhang, E. (., Wu, S., Chen, R., Snyder, M., Hancock, W. S., Omenn, G. S. 2014; 13 (1): 212-227

    Abstract

    This study was conducted as a part of the Chromosome-Centric Human Proteome Project (C-HPP) of the Human Proteome Organization. The United States team of C-HPP is focused on characterizing the protein-coding genes in chromosome 17. Despite its small size, chromosome 17 is rich in protein-coding genes; it contains many cancer-associated genes, including BRCA1, ERBB2, (Her2/neu), and TP53. The goal of this study was to examine the splice variants expressed in three ERBB2 expressed breast cancer cell-line models of hormone-receptor-negative breast cancers by integrating RNA-Seq and proteomic mass spectrometry data. The cell lines represent distinct phenotypic variations subtype: SKBR3 (ERBB2+ (overexpression)/ER-/PR-; adenocarcinoma), SUM190 (ERBB2+ (overexpression)/ER-/PR-; inflammatory breast cancer), and SUM149 (ERBB2 (low expression) ER-/PR-; inflammatory breast cancer). We identified more than one splice variant for 1167 genes expressed in at least one of the three cancer cell lines. We found multiple variants of genes that are in the signaling pathways downstream of ERBB2 along with variants specific to one cancer cell line compared with the other two cancer cell lines and with normal mammary cells. The overall transcript profiles based on read counts indicated more similarities between SKBR3 and SUM190. The top-ranking Gene Ontology and BioCarta pathways for the cell-line specific variants pointed to distinct key mechanisms including: amino sugar metabolism, caspase activity, and endocytosis in SKBR3; different aspects of metabolism, especially of lipids in SUM190; cell-to-cell adhesion, integrin, and ERK1/ERK2 signaling; and translational control in SUM149. The analyses indicated an enrichment in the electron transport chain processes in the ERBB2 overexpressed cell line models and an association of nucleotide binding, RNA splicing, and translation processes with the IBC models, SUM190 and SUM149. Detailed experimental studies on the distinct variants identified from each of these three breast cancer cell line models that may open opportunities for drug target discovery and help unveil their specific roles in cancer progression and metastasis.

    View details for DOI 10.1021/pr400773v

    View details for Web of Science ID 000329472700022

    View details for PubMedID 24111759

  • Chromatin immunoprecipitation and multiplex sequencing (ChIP-Seq) to identify global transcription factor binding sites in the nematode Caenorhabditis elegans. Methods in enzymology Brdlik, C. M., Niu, W., Snyder, M. 2014; 539: 89-111

    Abstract

    The global identification of transcription factor (TF) binding sites is a critical step in the elucidation of the functional elements of the genome. Several methods have been developed that map TF binding in human cells, yeast, and other model organisms. These methods make use of chromatin immunoprecipitation, or ChIP, and take advantage of the fact that formaldehyde fixation of living cells can be used to cross-link DNA sequences to the TFs that bind them in vivo. In ChIP, the cross-linked TF-DNA complexes are sheared by sonication, size fractionated, and incubated with antibody specific to the TF of interest to generate a library of TF-bound DNA sequences. ChIP-chip was the first technology developed to globally identify TF-bound DNA sequences and involves subsequent hybridization of the ChIP DNA to oligonucleotide microarrays. However, ChIP-chip proved to be costly, labor-intensive, and limited by the fixed number of probes available on the microarray chip. ChIP-Seq combines ChIP with massively parallel high-throughput sequencing (see Explanatory Chapter: Next Generation Sequencing) and has demonstrated vast improvement over ChIP-chip with respect to time and cost, signal-to-noise ratio, and resolution. In particular, multiplex sequencing can be used to achieve a higher throughput in ChIP-Seq analyses involving organisms with genomes of lower complexity than that of human (Lefrançois et al., 2009) and thereby reduce the cost and amount of time needed for each result. The multiplex ChIP-Seq method described in this section has been developed for Caenorhabditis elegans, but is easily adaptable for other organisms.

    View details for DOI 10.1016/B978-0-12-420120-0.00007-4

    View details for PubMedID 24581441

  • STAT3 Targets Suggest Mechanisms of Aggressive Tumorigenesis in Diffuse Large B-Cell Lymphoma G3-GENES GENOMES GENETICS Hardee, J., Ouyang, Z., Zhang, Y., Kundaje, A., Lacroute, P., Snyder, M. 2013; 3 (12): 2173-2185

    Abstract

    The signal transducer and activator of transcription 3 (STAT3) is a transcription factor that, when dysregulated, becomes a powerful oncogene found in many human cancers, including diffuse large B-cell lymphoma. Diffuse large B-cell lymphoma is the most common form of non-Hodgkin's lymphoma and has two major subtypes: germinal center B-cell-like and activated B-cell-like. Compared with the germinal center B-cell-like form, activated B-cell-like lymphomas respond much more poorly to current therapies and often exhibit overexpression or overactivation of STAT3. To investigate how STAT3 might contribute to this aggressive phenotype, we have integrated genome-wide studies of STAT3 DNA binding using chromatin immunoprecipitation-sequencing with whole-transcriptome profiling using RNA-sequencing. STAT3 binding sites are present near almost a third of all genes that differ in expression between the two subtypes, and examination of the affected genes identified previously undetected and clinically significant pathways downstream of STAT3 that drive oncogenesis. Novel treatments aimed at these pathways may increase the survivability of activated B-cell-like diffuse large B-cell lymphoma.

    View details for DOI 10.1534/g3.113.007674

    View details for PubMedID 24142927

  • Toward More Transparent and Reproducible Omics Studies Through a Common Metadata Checklist and Data Publications. Big data Kolker, E., Özdemir, V., Martens, L., Hancock, W., Anderson, G., Anderson, N., Aynacioglu, S., Baranova, A., Campagna, S. R., Chen, R., Choiniere, J., Dearth, S. P., Feng, W., Ferguson, L., Fox, G., Frishman, D., Grossman, R., Heath, A., Higdon, R., Hutz, M. H., Janko, I., Jiang, L., Joshi, S., Kel, A., Kemnitz, J. W., Kohane, I. S., Kolker, N., Lancet, D., Lee, E., Li, W., Lisitsa, A., Llerena, A., Macnealy-Koch, C., Marshall, J., Masuzzo, P., May, A., Mias, G., Monroe, M., Montague, E., Mooney, S., Nesvizhskii, A., Noronha, S., Omenn, G., Rajasimha, H., Ramamoorthy, P., Sheehan, J., Smarr, L., Smith, C. V., Smith, T., Snyder, M., Rapole, S., Srivastava, S., Stanberry, L., Stewart, E., Toppo, S., Uetz, P., Verheggen, K., Voy, B. H., Warnich, L., Wilhelm, S. W., Yandl, G. 2013; 1 (4): 196-201

    Abstract

    Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.

    View details for DOI 10.1089/big.2013.0039

    View details for PubMedID 27447251

  • Metadata Checklist for the Integrated Personal Omics Study: Proteomics and Metabolomics Experiments. Big data Snyder, M., Mias, G., Stanberry, L., Kolker, E. 2013; 1 (4): 202-206

    Abstract

    The integrative personal omics profiling study introduced a novel, integrative approach based on personalized, longitudinal, multi-omics data. The study collected genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14-month period. The results revealed various medical risks and extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. The current article is a data publication that provides the checklists for the metadata of the proteomics (see Table 1 ) and metabolomics (see Table 2 ) datasets of the study. The proposed checklist was recently developed and endorsed by the Data-Enabled Life Sciences Alliance (DELSA Global). We call for the broader use of data publications using the metadata checklist to make omics data more discoverable, interpretable, and reusable, while enabling appropriate attribution to data generators and infrastructure science builders.

    View details for DOI 10.1089/big.2013.0040

    View details for PubMedID 27447252

  • METADATA CHECKLIST FOR THE INTEGRATED PERSONAL OMICS STUDY: Proteomics and Metabolomics Experiments BIG DATA Snyder, M., Mias, G., Stanberry, L., Kolker, E. 2013; 1 (4): BD202-U81

    Abstract

    The integrative personal omics profiling study introduced a novel, integrative approach based on personalized, longitudinal, multi-omics data. The study collected genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14-month period. The results revealed various medical risks and extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. The current article is a data publication that provides the checklists for the metadata of the proteomics (see Table 1 ) and metabolomics (see Table 2 ) datasets of the study. The proposed checklist was recently developed and endorsed by the Data-Enabled Life Sciences Alliance (DELSA Global). We call for the broader use of data publications using the metadata checklist to make omics data more discoverable, interpretable, and reusable, while enabling appropriate attribution to data generators and infrastructure science builders.

    View details for DOI 10.1089/big.2013.0040

    View details for Web of Science ID 000209646300006

  • TOWARD MORE TRANSPARENT AND REPRODUCIBLE OMICS STUDIES THROUGH A COMMON METADATA CHECKLIST AND DATA PUBLICATIONS BIG DATA Kolker, E., Oezdemir, V., Martens, L., Hancock, W., Anderson, G., Anderson, N., Aynacioglu, S., Baranova, A., Campagna, S. R., Chen, R., Choiniere, J., Dearth, S. P., Feng, W., Ferguson, L., Fox, G., Frishman, D., Grossman, R., Heath, A., Higdon, R., Hutz, M. H., Janko, I., Jiang, L., Joshi, S., Kel, A., Kemnitz, J. W., Kohane, I. S., Kolker, N., Lancet, D., Lee, E., Li, W., Lisitsa, A., Llerena, A., Macnealy-Koch, C., Marshall, J., Masuzzo, P., May, A., Mias, G., Monroe, M., Montague, E., Mooney, S., Nesvizhskii, A., Noronha, S., Omenn, G., Rajasimha, H., Ramamoorthy, P., Sheehan, J., Smarr, L., Smith, C. V., Smith, T., Snyder, M., Rapole, S., Srivastava, S., Stanberry, L., Stewart, E., Toppo, S., Uetz, P., Verheggen, K., Voy, B. H., Warnich, L., Wilhelm, S. W., Yandl, G. 2013; 1 (4): BD196-?

    Abstract

    Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.

    View details for DOI 10.1089/big.2013.0039

    View details for Web of Science ID 000209646300005

  • Impacts of variation in the human genome on gene regulation. Journal of molecular biology Haraksingh, R. R., Snyder, M. P. 2013; 425 (21): 3970-3977

    Abstract

    Recent advances in fast and inexpensive DNA sequencing have enabled the extensive study of genomic and transciptomic variation in humans. Human genomic variation is composed of sequence and structural changes including single-nucleotide and multinucleotide variants, short insertions or deletions (indels), larger copy number variants, and similarly sized copy neutral inversions and translocations. It is now well established that any two genomes differ extensively and that structural changes constitute the most prominent source of this variation. There have also been major technological advances in RNA sequencing to globally quantify and describe diversity in transcripts. Large consortia such as the 1000 Genomes Project and the Enclyclopedia of DNA Elements Project are producing increasingly comphrehensive maps outlining the regions of the human genome containing variants and functional elements, respectively. Integration of genetic variation data and extensive annotation of functional genomic elements, along with the ability to measure global transcription, allow the impacts of genetic variants on gene expression to be resolved. There are several well-established models by which genetic variants affect gene regulation depending on the type, nature, and position of the variant with respect to the affected genes. These effects can be manifested in two ways: changes to transcript sequences and isoforms by coding variants, and changes to transcript abundance by dosage or regulatory variants. Here, we review the current state of how genetic variations impact gene regulation locally and globally in the human genome.

    View details for DOI 10.1016/j.jmb.2013.07.015

    View details for PubMedID 23871684

  • Defective sphingosine 1-phosphate receptor 1 (S1P1) phosphorylation exacerbates TH17-mediated autoimmune neuroinflammation. Nature immunology Garris, C. S., Wu, L., Acharya, S., Arac, A., Blaho, V. A., Huang, Y., Moon, B. S., Axtell, R. C., Ho, P. P., Steinberg, G. K., Lewis, D. B., Sobel, R. A., Han, D. K., Steinman, L., Snyder, M. P., Hla, T., Han, M. H. 2013; 14 (11): 1166-1172

    Abstract

    Sphingosine 1-phosphate (S1P) signaling regulates lymphocyte egress from lymphoid organs into systemic circulation. The sphingosine phosphate receptor 1 (S1P1) agonist FTY-720 (Gilenya) arrests immune trafficking and prevents multiple sclerosis (MS) relapses. However, alternative mechanisms of S1P-S1P1 signaling have been reported. Phosphoproteomic analysis of MS brain lesions revealed S1P1 phosphorylation on S351, a residue crucial for receptor internalization. Mutant mice harboring an S1pr1 gene encoding phosphorylation-deficient receptors (S1P1(S5A)) developed severe experimental autoimmune encephalomyelitis (EAE) due to autoimmunity mediated by interleukin 17 (IL-17)-producing helper T cells (TH17 cells) in the peripheral immune and nervous system. S1P1 directly activated the Jak-STAT3 signal-transduction pathway via IL-6. Impaired S1P1 phosphorylation enhances TH17 polarization and exacerbates autoimmune neuroinflammation. These mechanisms may be pathogenic in MS.

    View details for DOI 10.1038/ni.2730

    View details for PubMedID 24076635

  • Defective sphingosine 1-phosphate receptor 1 (S1P1) phosphorylation exacerbates TH17-mediated autoimmune neuroinflammation. Nature immunology Garris, C. S., Wu, L., Acharya, S., Arac, A., Blaho, V. A., Huang, Y., Moon, B. S., Axtell, R. C., Ho, P. P., Steinberg, G. K., Lewis, D. B., Sobel, R. A., Han, D. K., Steinman, L., Snyder, M. P., Hla, T., Han, M. H. 2013; 14 (11): 1166-1172

    View details for DOI 10.1038/ni.2730

    View details for PubMedID 24076635

  • Comprehensive whole-genome sequencing of an early-stage primary myelofibrosis patient defines low mutational burden and non-recurrent candidate genes. Haematologica Merker, J. D., Roskin, K. M., Ng, D., Pan, C., Fisk, D. G., King, J. J., Hoh, R., Stadler, M., Okumoto, L. M., Abidi, P., Hewitt, R., Jones, C. D., Gojenola, L., Clark, M. J., Zhang, B., Cherry, A. M., George, T. I., Snyder, M., Boyd, S. D., Zehnder, J. L., Fire, A. Z., Gotlib, J. 2013; 98 (11): 1689-1696

    Abstract

    In order to identify novel somatic mutations associated with classic BCR/ABL1-negative myeloproliferative neoplasms, we performed high-coverage genome sequencing of DNA from peripheral blood granulocytes and cultured skin fibroblasts from a patient with MPL W515K-positive primary myelofibrosis. The primary myelofibrosis genome had a low somatic mutation rate, consistent with that observed in similar hematopoietic tumor genomes. Interfacing of whole-genome DNA sequence data with RNA expression data identified three somatic mutations of potential functional significance: a nonsense mutation in CARD6, implicated in modulation of NF-kappaB activation; a 19-base pair deletion involving a potential regulatory region in the 5'-untranslated region of BRD2, implicated in transcriptional regulation and cell cycle control; and a non-synonymous point mutation in KIAA0355, an uncharacterized protein. Additional mutations in three genes (CAP2, SOX30, and MFRP) were also evident, albeit with no support for expression at the RNA level. Re-sequencing of these six genes in 178 patients with polycythemia vera, essential thrombocythemia, and myelofibrosis did not identify recurrent somatic mutations in these genes. Finally, we describe methods for reducing false-positive variant calls in the analysis of hematologic malignancies with a low somatic mutation rate. This trial is registered with ClinicalTrials.gov (NCT01108159).

    View details for DOI 10.3324/haematol.2013.092379

    View details for PubMedID 23872309

  • A single-molecule long-read survey of the human transcriptome. Nature biotechnology Sharon, D., Tilgner, H., Grubert, F., Snyder, M. 2013; 31 (11): 1009-1014

    Abstract

    Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5' to 3' end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5' ends. For longer RNA molecules more 5' nucleotides are missing, but complete intron structures are often preserved. In total, we identify ∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.

    View details for DOI 10.1038/nbt.2705

    View details for PubMedID 24108091

  • Incorporating Motif Analysis into Gene Co-expression Networks Reveals Novel Modular Expression Pattern and New Signaling Pathways PLOS GENETICS Ma, S., Shah, S., Bohnert, H. J., Snyder, M., Dinesh-Kumar, S. P. 2013; 9 (10)

    Abstract

    Understanding of gene regulatory networks requires discovery of expression modules within gene co-expression networks and identification of promoter motifs and corresponding transcription factors that regulate their expression. A commonly used method for this purpose is a top-down approach based on clustering the network into a range of densely connected segments, treating these segments as expression modules, and extracting promoter motifs from these modules. Here, we describe a novel bottom-up approach to identify gene expression modules driven by known cis-regulatory motifs in the gene promoters. For a specific motif, genes in the co-expression network are ranked according to their probability of belonging to an expression module regulated by that motif. The ranking is conducted via motif enrichment or motif position bias analysis. Our results indicate that motif position bias analysis is an effective tool for genome-wide motif analysis. Sub-networks containing the top ranked genes are extracted and analyzed for inherent gene expression modules. This approach identified novel expression modules for the G-box, W-box, site II, and MYB motifs from an Arabidopsis thaliana gene co-expression network based on the graphical Gaussian model. The novel expression modules include those involved in house-keeping functions, primary and secondary metabolism, and abiotic and biotic stress responses. In addition to confirmation of previously described modules, we identified modules that include new signaling pathways. To associate transcription factors that regulate genes in these co-expression modules, we developed a novel reporter system. Using this approach, we evaluated MYB transcription factor-promoter interactions within MYB motif modules.

    View details for DOI 10.1371/journal.pgen.1003840

    View details for Web of Science ID 000330367200023

    View details for PubMedID 24098147

    View details for PubMedCentralID PMC3789834

  • Genome-wide Association Analysis of Blood-Pressure Traits in African-Ancestry Individuals Reveals Common Associated Genes in African and Non-African Populations AMERICAN JOURNAL OF HUMAN GENETICS Franceschini, N., Fox, E., Zhang, Z., Edwards, T. L., Nalls, M. A., Sung, Y. J., Tayo, B. O., Sun, Y. V., Gottesman, O., Adeyemo, A., Johnson, A. D., Young, J. H., Rice, K., Duan, Q., Chen, F., Li, Y., Tang, H., Fornage, M., Keene, K. L., Andrews, J. S., Smith, J. A., Fau, J. D., Guangfa, Z., Guo, W., Liu, Y., Murray, S. S., Musani, S. K., Srinivasan, S., Edwards, D. R., Wang, H., Becker, L. C., Bovet, P., Bochud, M., Broecke, U., Burnier, M., Carty, C., Chasman, D. I., Ehret, G., Chen, W., Chen, G., Chen, W., Ding, J., Dreisbach, A. W., Evans, M. K., Guo, X., Garcia, M. E., Jensen, R., Keller, M. E., Lettre, G., Lotay, V., Martin, L. W., Moore, J. H., Morrison, A. C., Mosley, T. H., Ogunniyi, A., Palmas, W., Papanicolaou, G., Penman, A., Polak, J. F., Ridker, P. M., Salako, B., Singleton, A. B., Shriner, D., Taylor, K. D., Vasan, R., Wiggins, K., Williams, S. M., Yanek, L. R., Zhao, W., Zonderman, A. B., Becker, D. M., Berenson, G., Boerwinkle, E., Bottinger, E., Cushman, M., Eaton, C., Nyberg, F., Heiss, G., Hirschhron, J. N., Howard, V. J., Karczewsk, K. J., Lanktree, M. B., Liu, K., Liu, Y., Loos, R., Margolis, K., Snyder, M., Psaty, B. M., Schork, N. J., Weir, D. R., Rotimi, C. N., Sale, M. M., Harris, T., Kardia, S. L., Hunt, S. C., Arnett, D., Redline, S., Cooper, R. S., Risch, N. J., Rao, D. C., Rotter, J. I., Chakravarti, A., Reiner, A. P., Levy, D., Keating, B. J., Zhu, X. 2013; 93 (3): 545-554

    Abstract

    High blood pressure (BP) is more prevalent and contributes to more severe manifestations of cardiovascular disease (CVD) in African Americans than in any other United States ethnic group. Several small African-ancestry (AA) BP genome-wide association studies (GWASs) have been published, but their findings have failed to replicate to date. We report on a large AA BP GWAS meta-analysis that includes 29,378 individuals from 19 discovery cohorts and subsequent replication in additional samples of AA (n = 10,386), European ancestry (EA) (n = 69,395), and East Asian ancestry (n = 19,601). Five loci (EVX1-HOXA, ULK4, RSPO3, PLEKHG1, and SOX6) reached genome-wide significance (p < 1.0 × 10(-8)) for either systolic or diastolic BP in a transethnic meta-analysis after correction for multiple testing. Three of these BP loci (EVX1-HOXA, RSPO3, and PLEKHG1) lack previous associations with BP. We also identified one independent signal in a known BP locus (SOX6) and provide evidence for fine mapping in four additional validated BP loci. We also demonstrate that validated EA BP GWAS loci, considered jointly, show significant effects in AA samples. Consequently, these findings suggest that BP loci might have universal effects across studied populations, demonstrating that multiethnic samples are an essential component in identifying, fine mapping, and understanding their trait variability.

    View details for DOI 10.1016/j.ajhg.2013.07.010

    View details for Web of Science ID 000330268900014

    View details for PubMedID 23972371

    View details for PubMedCentralID PMC3769920

  • Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females SCIENCE Poznik, G. D., Henn, B. M., Yee, M., Sliwerska, E., Euskirchen, G. M., Lin, A. A., Snyder, M., Quintana-Murci, L., Kidd, J. M., Underhill, P. A., Bustamante, C. D. 2013; 341 (6145): 562-565

    Abstract

    The Y chromosome and the mitochondrial genome have been used to estimate when the common patrilineal and matrilineal ancestors of humans lived. We sequenced the genomes of 69 males from nine populations, including two in which we find basal branches of the Y-chromosome tree. We identify ancient phylogenetic structure within African haplogroups and resolve a long-standing ambiguity deep within the tree. Applying equivalent methodologies to the Y chromosome and the mitochondrial genome, we estimate the time to the most recent common ancestor (T(MRCA)) of the Y chromosome to be 120 to 156 thousand years and the mitochondrial genome T(MRCA) to be 99 to 148 thousand years. Our findings suggest that, contrary to previous claims, male lineages do not coalesce significantly more recently than female lineages.

    View details for DOI 10.1126/science.1237619

    View details for Web of Science ID 000322586700057

    View details for PubMedID 23908239

  • Genome-wide profiling of human cap-independent translation-enhancing elements. Nature methods Wellensiek, B. P., Larsen, A. C., Stephens, B., Kukurba, K., Waern, K., Briones, N., Liu, L., Snyder, M., Jacobs, B. L., Kumar, S., Chaput, J. C. 2013; 10 (8): 747-750

    Abstract

    We report an in vitro selection strategy to identify RNA sequences that mediate cap-independent initiation of translation. This method entails mRNA display of trillions of genomic fragments, selection for initiation of translation and high-throughput deep sequencing. We identified >12,000 translation-enhancing elements (TEEs) in the human genome, generated a high-resolution map of human TEE-bearing regions (TBRs), and validated the function of a subset of sequences in vitro and in cultured cells.

    View details for DOI 10.1038/nmeth.2522

    View details for PubMedID 23770754

  • Genome-wide profiling of human cap-independent translation-enhancing elements NATURE METHODS Wellensiek, B. P., Larsen, A. C., Stephens, B., Kukurba, K., Waern, K., Briones, N., Liu, L., Snyder, M., Jacobs, B. L., Kumar, S., Chaput, J. C. 2013; 10 (8): 747-?

    Abstract

    We report an in vitro selection strategy to identify RNA sequences that mediate cap-independent initiation of translation. This method entails mRNA display of trillions of genomic fragments, selection for initiation of translation and high-throughput deep sequencing. We identified >12,000 translation-enhancing elements (TEEs) in the human genome, generated a high-resolution map of human TEE-bearing regions (TBRs), and validated the function of a subset of sequences in vitro and in cultured cells.

    View details for DOI 10.1038/nmeth.2522

    View details for Web of Science ID 000322453600023

    View details for PubMedID 23770754

  • Functional genomic screen of human stem cell differentiation reveals pathways involved in neurodevelopment and neurodegeneration PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zhang, Y., Schulz, V. P., Reed, B. D., Wang, Z., Pan, X., Mariani, J., Euskirchen, G., Snyder, M. P., Vaccarino, F. M., Ivanova, N., Weissman, S. M., Szekely, A. M. 2013; 110 (30): 12361-12366

    Abstract

    Human embryonic stem cells (hESCs) can be induced and differentiated to form a relatively homogeneous population of neuronal precursors in vitro. We have used this system to screen for genes necessary for neural lineage development by using a pooled human short hairpin RNA (shRNA) library screen and massively parallel sequencing. We confirmed known genes and identified several unpredicted genes with interrelated functions that were specifically required for the formation or survival of neuronal progenitor cells without interfering with the self-renewal capacity of undifferentiated hESCs. Among these are several genes that have been implicated in various neurodevelopmental disorders (i.e., brain malformations, mental retardation, and autism). Unexpectedly, a set of genes mutated in late-onset neurodegenerative disorders and with roles in the formation of RNA granules were also found to interfere with neuronal progenitor cell formation, suggesting their functional relevance in early neurogenesis. This study advances the feasibility and utility of using pooled shRNA libraries in combination with next-generation sequencing for a high-throughput, unbiased functional genomic screen. Our approach can also be used with patient-specific human-induced pluripotent stem cell-derived neural models to obtain unparalleled insights into developmental and degenerative processes in neurological or neuropsychiatric disorders with monogenic or complex inheritance.

    View details for DOI 10.1073/pnas.1309725110

    View details for Web of Science ID 000322112300054

    View details for PubMedID 23836664

    View details for PubMedCentralID PMC3725080

  • Variation and genetic control of protein abundance in humans NATURE Wu, L., Candille, S. I., Choi, Y., Xie, D., Jiang, L., Li-Pook-Than, J., Tang, H., Snyder, M. 2013; 499 (7456): 79-82

    Abstract

    Gene expression differs among individuals and populations and is thought to be a major determinant of phenotypic variation. Although variation and genetic loci responsible for RNA expression levels have been analysed extensively in human populations, our knowledge is limited regarding the differences in human protein abundance and the genetic basis for this difference. Variation in messenger RNA expression is not a perfect surrogate for protein expression because the latter is influenced by an array of post-transcriptional regulatory mechanisms, and, empirically, the correlation between protein and mRNA levels is generally modest. Here we used isobaric tag-based quantitative mass spectrometry to determine relative protein levels of 5,953 genes in lymphoblastoid cell lines from 95 diverse individuals genotyped in the HapMap Project. We found that protein levels are heritable molecular phenotypes that exhibit considerable variation between individuals, populations and sexes. Levels of specific sets of proteins involved in the same biological process covary among individuals, indicating that these processes are tightly regulated at the protein level. We identified cis-pQTLs (protein quantitative trait loci), including variants not detected by previous transcriptome studies. This study demonstrates the feasibility of high-throughput human proteome quantification that, when integrated with DNA variation and transcriptome information, adds a new dimension to the characterization of gene expression regulation.

    View details for DOI 10.1038/nature12223

    View details for Web of Science ID 000321285600037

    View details for PubMedID 23676674

  • Identification of Genes Critical for Resistance to Infection by West Nile Virus Using RNA-Seq Analysis VIRUSES-BASEL Qian, F., Chung, L., Zheng, W., Bruno, V., Alexander, R. P., Wang, Z., Wang, X., Kurscheid, S., Zhao, H., Fikrig, E., Gerstein, M., Snyder, M., Montgomery, R. R. 2013; 5 (7): 1664-1681

    Abstract

    The West Nile virus (WNV) is an emerging infection of biodefense concern and there are no available treatments or vaccines. Here we used a high-throughput method based on a novel gene expression analysis, RNA-Seq, to give a global picture of differential gene expression by primary human macrophages of 10 healthy donors infected in vitro with WNV. From a total of 28 million reads per sample, we identified 1,514 transcripts that were differentially expressed after infection. Both predicted and novel gene changes were detected, as were gene isoforms, and while many of the genes were expressed by all donors, some were unique. Knock-down of genes not previously known to be associated with WNV resistance identified their critical role in control of viral infection. Our study distinguishes both common gene pathways as well as novel cellular responses. Such analyses will be valuable for translational studies of susceptible and resistant individuals--and for targeting therapeutics--in multiple biological settings.

    View details for DOI 10.3390/v5071664

    View details for Web of Science ID 000322172200005

    View details for PubMedID 23881275

    View details for PubMedCentralID PMC3738954

  • Genome Wide Proteomics of ERBB2 and EGFR and Other Oncogenic Pathways in Inflammatory Breast Cancer. Journal of proteome research Zhang, E. Y., Cristofanilli, M., Robertson, F., Reuben, J. M., Mu, Z., Beavis, R. C., Im, H., Snyder, M., Hofree, M., Ideker, T., Omenn, G. S., Fanayan, S., Jeong, S., Paik, Y., Zhang, A. F., Wu, S., Hancock, W. S. 2013; 12 (6): 2805-2817

    Abstract

    In this study we selected three breast cancer cell lines (SKBR3, SUM149 and SUM190) with different oncogene expression levels involved in ERBB2 and EGFR signaling pathways as a model system for the evaluation of selective integration of subsets of transcriptomic and proteomic data. We assessed the oncogene status with reads per kilobase per million mapped reads (RPKM) values for ERBB2 (14.4, 400, and 300 for SUM149, SUM190, and SKBR3, respectively) and for EGFR (60.1, not detected, and 1.4 for the same 3 cell lines). We then used RNA-Seq data to identify those oncogenes with significant transcript levels in these cell lines (total 31) and interrogated the corresponding proteomics data sets for proteins with significant interaction values with these oncogenes. The number of observed interactors for each oncogene showed a significant range, e.g., 4.2% (JAK1) to 27.3% (MYC). The percentage is measured as a fraction of the total protein interactions in a given data set vs total interactors for that oncogene in STRING (Search Tool for the Retrieval of Interacting Genes/Proteins, version 9.0) and I2D (Interologous Interaction Database, version 1.95). This approach allowed us to focus on 4 main oncogenes, ERBB2, EGFR, MYC, and GRB2, for pathway analysis. We used bioinformatics sites GeneGo, PathwayCommons and NCI receptor signaling networks to identify pathways that contained the four main oncogenes and had good coverage in the transcriptomic and proteomic data sets as well as a significant number of oncogene interactors. The four pathways identified were ERBB signaling, EGFR1 signaling, integrin outside-in signaling, and validated targets of C-MYC transcriptional activation. The greater dynamic range of the RNA-Seq values allowed the use of transcript ratios to correlate observed protein values with the relative levels of the ERBB2 and EGFR transcripts in each of the four pathways. This provided us with potential proteomic signatures for the SUM149 and 190 cell lines, growth factor receptor-bound protein 7 (GRB7), Crk-like protein (CRKL) and Catenin delta-1 (CTNND1) for ERBB signaling; caveolin 1 (CAV1), plectin (PLEC) for EGFR signaling; filamin A (FLNA) and actinin alpha1 (ACTN1) (associated with high levels of EGFR transcript) for integrin signalings; branched chain amino-acid transaminase 1 (BCAT1), carbamoyl-phosphate synthetase (CAD), nucleolin (NCL) (high levels of EGFR transcript); transferrin receptor (TFRC), metadherin (MTDH) (high levels of ERBB2 transcript) for MYC signaling; S100-A2 protein (S100A2), caveolin 1 (CAV1), Serpin B5 (SERPINB5), stratifin (SFN), PYD and CARD domain containing (PYCARD), and EPH receptor A2 (EPHA2) for PI3K signaling, p53 subpathway. Future studies of inflammatory breast cancer (IBC), from which the cell lines were derived, will be used to explore the significance of these observations.

    View details for DOI 10.1021/pr4001527

    View details for PubMedID 23647160

  • Overview of high throughput sequencing technologies to elucidate molecular pathways in cardiovascular diseases. Circulation research Churko, J. M., Mantalas, G. L., Snyder, M. P., Wu, J. C. 2013; 112 (12): 1613-1623

    Abstract

    High throughput sequencing technologies have become essential in studies on genomics, epigenomics, and transcriptomics. Although sequencing information has traditionally been elucidated using a low throughput technique called Sanger sequencing, high throughput sequencing technologies are capable of sequencing multiple DNA molecules in parallel, enabling hundreds of millions of DNA molecules to be sequenced at a time. This advantage allows high throughput sequencing to be used to create large data sets, generating more comprehensive insights into the cellular genomic and transcriptomic signatures of various diseases and developmental stages. Within high throughput sequencing technologies, whole exome sequencing can be used to identify novel variants and other mutations that may underlie many genetic cardiac disorders, whereas RNA sequencing can be used to analyze how the transcriptome changes. Chromatin immunoprecipitation sequencing and methylation sequencing can be used to identify epigenetic changes, whereas ribosome sequencing can be used to determine which mRNA transcripts are actively being translated. In this review, we will outline the differences in various sequencing modalities and examine the main sequencing platforms on the market in terms of their relative read depths, speeds, and costs. Finally, we will discuss the development of future sequencing platforms and how these new technologies may improve on current sequencing platforms. Ultimately, these sequencing technologies will be instrumental in further delineating how the cardiovascular system develops and how perturbations in DNA and RNA can lead to cardiovascular disease.

    View details for DOI 10.1161/CIRCRESAHA.113.300939

    View details for PubMedID 23743227

  • Metabolomics as a robust tool in systems biology and personalized medicine: an open letter to the metabolomics community METABOLOMICS Snyder, M., Li, X. 2013; 9 (3): 532-534
  • iPOP Goes the World: Integrated Personalized Omics Profiling and the Road toward Improved Health Care. Chemistry & biology Li-Pook-Than, J., Snyder, M. 2013; 20 (5): 660-666

    Abstract

    The health of an individual depends upon their DNA as well as upon environmental factors (environome or exposome). It is expected that although the genome is the blueprint of an individual, its analysis with that of the other omes such as the DNA methylome, the transcriptome, proteome, and metabolome will further provide a dynamic assessment of the physiology and health state of an individual. This review will help to categorize the current progress of omics analyses and how omics integration can be used for medical research. We believe that integrative personal omics profiling (iPOP) is a stepping stone to a new road to personalized health care and may improve disease risk assessment, accuracy of diagnosis, disease monitoring, targeted treatments, and understanding the biological processes of disease states for their prevention.

    View details for DOI 10.1016/j.chembiol.2013.05.001

    View details for PubMedID 23706632

  • Identification of Potential Glycan Cancer Markers with Sialic Acid Attached to Sialic Acid and Up-regulated Fucosylated Galactose Structures in Epidermal Growth Factor Receptor Secreted from A431 Cell Line. Molecular & cellular proteomics Wu, S., Taylor, A. D., Lu, Q., Hanash, S. M., Im, H., Snyder, M., Hancock, W. S. 2013; 12 (5): 1239-1249

    Abstract

    We have used powerful HPLC-mass spectrometric approaches to characterize the secreted form of epidermal growth factor receptor (sEGFR). We demonstrated that the amino acid sequence lacked the cytoplasmic domain and was consistent with the primary sequence reported for EGFR purified from a human plasma pool. One of the sEGFR forms, attributed to the alternative RNA splicing, was also confirmed by transcriptional analysis (RNA sequencing). Two unusual types of glycan structures were observed in sEGFR as compared with membrane-bound EGFR from the A431 cell line. The unusual glycan structures were di-sialylated glycans (sialic acid attached to sialic acid) at Asn-151 and N-acetylhexosamine attached to a branched fucosylated galactose with N-acetylglucosamine moieties (HexNAc-(Fuc)Gal-GlcNAc) at Asn-420. These unusual glycans at specific sites were either present at a much lower level or were not observable in membrane-bound EGFR present in the A431 cell lysate. The observation of these di-sialylated glycan structures was consistent with the observed expression of the corresponding α-N-acetylneuraminide α-2,8-sialyltransferase 2 (ST8SiA2) and α-N-acetylneuraminide α-2,8-sialyltransferase 4 (ST8SiA4), by quantitative real time RT-PCR. The connectivity present at the branched fucosylated galactose was also confirmed by methylation of the glycans followed by analysis with sequential fragmentation in mass spectrometry. We hypothesize that the presence of such glycan structures could promote secretion via anionic or steric repulsion mechanisms and thus facilitate the observation of these glycan forms in the secreted fractions. We plan to use this model system to facilitate the search for novel glycan structures present at specific sites in sEGFR as well as other secreted oncoproteins such as Erbb2 as markers of disease progression in blood samples from cancer patients.

    View details for DOI 10.1074/mcp.M112.024554

    View details for PubMedID 23371026

    View details for PubMedCentralID PMC3650335

  • Preparation of recombinant protein spotted arrays for proteome-wide identification of kinase targets. Current protocols in protein science / editorial board, John E. Coligan ... [et al.] Im, H., Snyder, M. 2013; Chapter 27: Unit 27 4-?

    Abstract

    Protein microarrays allow unique approaches for interrogating global protein interaction networks. Protein arrays can be divided into two categories: antibody arrays and functional protein arrays. Antibody arrays consist of various antibodies and are appropriate for profiling protein abundance and modifications. Functional full-length protein arrays employ full-length proteins with various post-translational modifications. A key advantage of the latter is rapid parallel processing of large number of proteins for studying highly controlled biochemical activities, protein-protein interactions, protein-nucleic acid interactions, and protein-small molecule interactions. This unit presents a protocol for constructing functional yeast protein microarrays for global kinase substrate identification. This approach enables the rapid determination of protein interaction networks in yeast on a proteome-wide level. The same methodology can be readily applied to higher eukaryotic systems with careful consideration of overexpression strategy.

    View details for DOI 10.1002/0471140864.ps2704s72

    View details for PubMedID 23546622

  • Proteogenomic Analysis of Human Colon Carcinoma Cell Lines LIM1215, LIM1899, and LIM2405 JOURNAL OF PROTEOME RESEARCH Fanayan, S., Smith, J. T., Lee, L. Y., Yan, F., Snyder, M., Hancock, W. S., Nice, E. 2013; 12 (4): 1732-1742

    Abstract

    As part of the genome-wide and chromosome-centric human proteomic project (C-HPP), we have integrated shotgun proteomics approach and a genome-wide transcriptomic approach (RNA-Seq) of a set of human colon cancer cell lines (LIM1215, LIM1899 and LIM2405) that were selected to represent a wide range of pathological states of colorectal cancer. The combination of a standard proteomics approach (1D-gel electrophoresis coupled to LC/ion trap mass spectrometry) and RNA-Seq allowed us to exploit the greater depth of the transcriptomics measurement (∼9800 transcripts per cell line) versus the protein observations (∼1900 protein identifications per cell line). Conversely, the proteomics data were helpful in identifying both cancer associated proteins with differential expression patterns as well as protein networks and pathways which appear to be deregulated in these cell lines. Examples of potential markers include mortalin, nucleophosmin, ezrin, LASP1, alpha and beta forms of spectrin, exportin, the carcinoembryonic antigen family, EGFR and MET. Interaction analyses identified the large intermediate filament family, the protein folding network and adapter proteins in focal adhesion networks, which included the CDC42 and RHOA signaling pathways that may have potential for identifying phenotypic states representing poorly and moderately differentiated states of CRC, with or without metastases.

    View details for DOI 10.1021/pr3010869

    View details for Web of Science ID 000317327500018

  • Comparative annotation of functional regions in the human genome using epigenomic data NUCLEIC ACIDS RESEARCH Won, K., Zhang, X., Wang, T., Ding, B., Raha, D., Snyder, M., Ren, B., Wang, W. 2013; 41 (8): 4423-4432

    Abstract

    Epigenetic regulation is dynamic and cell-type dependent. The recently available epigenomic data in multiple cell types provide an unprecedented opportunity for a comparative study of epigenetic landscape. We developed a machine-learning method called ChroModule to annotate the epigenetic states in eight ENCyclopedia Of DNA Elements cell types. The trained model successfully captured the characteristic histone-modification patterns associated with regulatory elements, such as promoters and enhancers, and showed superior performance on identifying enhancers compared with the state-of-art methods. In addition, given the fixed number of epigenetic states in the model, ChroModule allows straightforward illustration of epigenetic variability in multiple cell types. Using this feature, we found that invariable and variable epigenetic states across cell types correspond to housekeeping functions and stimulus response, respectively. Especially, we observed that enhancers, but not the other regulatory elements, dictate cell specificity, as similar cell types share common enhancers, and cell-type-specific enhancers are often bound by transcription factors playing critical roles in that cell type. More interestingly, we found some genomic regions are dormant in cell type but primed to become active in other cell types. These observations highlight the usefulness of ChroModule in comparative analysis and interpretation of multiple epigenomes.

    View details for DOI 10.1093/nar/gkt143

    View details for Web of Science ID 000318569700014

    View details for PubMedID 23482391

    View details for PubMedCentralID PMC3632130

  • Proteogenomic Analysis of Human Colon Carcinoma Cell Lines LIM1215, LIM1899, and LIM2405. Journal of proteome research Fanayan, S., Smith, J. T., Lee, L. Y., Yan, F., Snyder, M., Hancock, W. S., Nice, E. 2013

    Abstract

    As part of the genome-wide and chromosome-centric human proteomic project (C-HPP), we have integrated shotgun proteomics approach and a genome-wide transcriptomic approach (RNA-Seq) of a set of human colon cancer cell lines (LIM1215, LIM1899 and LIM2405) that were selected to represent a wide range of pathological states of colorectal cancer. The combination of a standard proteomics approach (1D-gel electrophoresis coupled to LC/ion trap mass spectrometry) and RNA-Seq allowed us to exploit the greater depth of the transcriptomics measurement (∼9800 transcripts per cell line) versus the protein observations (∼1900 protein identifications per cell line). Conversely, the proteomics data were helpful in identifying both cancer associated proteins with differential expression patterns as well as protein networks and pathways which appear to be deregulated in these cell lines. Examples of potential markers include mortalin, nucleophosmin, ezrin, LASP1, alpha and beta forms of spectrin, exportin, the carcinoembryonic antigen family, EGFR and MET. Interaction analyses identified the large intermediate filament family, the protein folding network and adapter proteins in focal adhesion networks, which included the CDC42 and RHOA signaling pathways that may have potential for identifying phenotypic states representing poorly and moderately differentiated states of CRC, with or without metastases.

    View details for DOI 10.1021/pr3010869

    View details for PubMedID 23458625

  • A Major Epigenetic Programming Mechanism Guided by piRMAs DEVELOPMENTAL CELL Huang, X. A., Yin, H., Sweeney, S., Raha, D., Snyder, M., Lin, H. 2013; 24 (5): 502-516

    Abstract

    A central enigma in epigenetics is how epigenetic factors are guided to specific genomic sites for their function. Previously, we reported that a Piwi-piRNA complex associates with the piRNA-complementary site in the Drosophila genome and regulates its epigenetic state. Here, we report that Piwi-piRNA complexes bind to numerous piRNA-complementary sequences throughout the genome, implicating piRNAs as a major mechanism that guides Piwi and Piwi-associated epigenetic factors to program the genome. To test this hypothesis, we demonstrate that inserting piRNA-complementary sequences to an ectopic site leads to Piwi, HP1a, and Su(var)3-9 recruitment to the site as well as H3K9me2/3 enrichment and reduced RNA polymerase II association, indicating that piRNA is both necessary and sufficient to recruit Piwi and epigenetic factors to specific genomic sites. Piwi deficiency drastically changed the epigenetic landscape and polymerase II profile throughout the genome, revealing the Piwi-piRNA mechanism as a major epigenetic programming mechanism in Drosophila.

    View details for DOI 10.1016/j.devcel.2013.01.023

    View details for Web of Science ID 000316163000005

    View details for PubMedID 23434410

  • Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing G3-GENES GENOMES GENETICS Tilgner, H., Raha, D., Habegger, L., Mohiuddin, M., Gerstein, M., Snyder, M. 2013; 3 (3): 387-397

    Abstract

    Precise identification of RNA-coding regions and transcriptomes of eukaryotes is a significant problem in biology. Currently, eukaryote transcriptomes are analyzed using deep short-read sequencing experiments of complementary DNAs. The resulting short-reads are then aligned against a genome and annotated junctions to infer biological meaning. Here we use long-read complementary DNA datasets for the analysis of a eukaryotic transcriptome and generate two large datasets in the human K562 and HeLa S3 cell lines. Both data sets comprised at least 4 million reads and had median read lengths greater than 500 bp. We show that annotation-independent alignments of these reads provide partial gene structures that are very much in-line with annotated gene structures, 15% of which have not been obtained in a previous de novo analysis of short reads. For long-noncoding RNAs (i.e., lncRNA) genes, however, we find an increased fraction of novel gene structures among our alignments. Other important aspects of transcriptome analysis, such as the description of cell type-specific splicing, can be performed in an accurate, reliable and completely annotation-free manner, making it ideal for the analysis of transcriptomes of newly sequenced genomes. Furthermore, we demonstrate that long read sequence can be assembled into full-length transcripts with considerable success. Our method is applicable to all long read sequencing technologies.

    View details for DOI 10.1534/g3.112.004812

    View details for Web of Science ID 000315950000002

    View details for PubMedID 23450794

    View details for PubMedCentralID PMC3583448

  • Personal genomes, quantitative dynamic omics and personalized medicine. Quantitative biology (Beijing, China) Mias, G. I., Snyder, M. 2013; 1 (1): 71-90

    Abstract

    The rapid technological developments following the Human Genome Project have made possible the availability of personalized genomes. As the focus now shifts from characterizing genomes to making personalized disease associations, in combination with the availability of other omics technologies, the next big push will be not only to obtain a personalized genome, but to quantitatively follow other omics. This will include transcriptomes, proteomes, metabolomes, antibodyomes, and new emerging technologies, enabling the profiling of thousands of molecular components in individuals. Furthermore, omics profiling performed longitudinally can probe the temporal patterns associated with both molecular changes and associated physiological health and disease states. Such data necessitates the development of computational methodology to not only handle and descriptively assess such data, but also construct quantitative biological models. Here we describe the availability of personal genomes and developing omics technologies that can be brought together for personalized implementations and how these novel integrated approaches may effectively provide a precise personalized medicine that focuses on not only characterization and treatment but ultimately the prevention of disease.

    View details for PubMedID 25798291

  • Extensive Transcript Diversity and Novel Upstream Open Reading Frame Regulation in Yeast G3-GENES GENOMES GENETICS Waern, K., Snyder, M. 2013; 3 (2): 343-352

    Abstract

    To understand the diversity of transcripts in yeast (Saccharomyces cerevisiae) we analyzed the transcriptional landscapes for cells grown under 18 different environmental conditions. Each sample was analyzed using RNA-sequencing, and a total of 670,446,084 uniquely mapped reads and 377,263 poly-adenylated end tags were produced. Consistent with previous studies, we find that the majority of yeast genes are expressed under one or more different conditions. By directly comparing the 5' and 3' ends of the transcribed regions, we find extensive differences in transcript ends across many conditions, especially those of stationary phase, growth in grape juice, and salt stimulation, suggesting differential choice of transcription start and stop sites is pervasive in yeast. Relative to the exponential growth condition (i.e., YPAD), transcripts differing at the 5' ends and 3' ends are predicted to differ in their annotated start codon in 21 genes and their annotated stop codon in 63 genes. Many (431) upstream open reading frames (uORFs) are found in alternate 5' ends and are significantly enriched in transcripts produced during the salt response. Mutational analysis of five genes with uORFs revealed that two sets of uORFs increase the expression of a reporter construct, indicating a role in activation which had not been reported previously, whereas two other uORFs decreased expression. In addition, RNA binding protein motifs are statistically enriched for alternate ends under many conditions. Overall, these results demonstrate enormous diversity of transcript ends, and that this heterogeneity is regulated under different environmental conditions. Moreover, transcript end diversity has important biological implications for the regulation of gene expression. In addition, our data also serve as a valuable resource for the scientific community.

    View details for DOI 10.1534/g3.112.003640

    View details for Web of Science ID 000314881600019

    View details for PubMedID 23390610

    View details for PubMedCentralID PMC3564994

  • SeqFold: Genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data GENOME RESEARCH Ouyang, Z., Snyder, M. P., Chang, H. Y. 2013; 23 (2): 377-387

    Abstract

    We present an integrative approach, SeqFold, that combines high-throughput RNA structure profiling data with computational prediction for genome-scale reconstruction of RNA secondary structures. SeqFold transforms experimental RNA structure information into a structure preference profile (SPP) and uses it to select stable RNA structure candidates representing the structure ensemble. Under a high-dimensional classification framework, SeqFold efficiently matches a given SPP to the most likely cluster of structures sampled from the Boltzmann-weighted ensemble. SeqFold is able to incorporate diverse types of RNA structure profiling data, including parallel analysis of RNA structure (PARS), selective 2'-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq), fragmentation sequencing (FragSeq) data generated by deep sequencing, and conventional SHAPE data. Using the known structures of a wide range of mRNAs and noncoding RNAs as benchmarks, we demonstrate that SeqFold outperforms or matches existing approaches in accuracy and is more robust to noise in experimental data. Application of SeqFold to reconstruct the secondary structures of the yeast transcriptome reveals the diverse impact of RNA secondary structure on gene regulation, including translation efficiency, transcription initiation, and protein-RNA interactions. SeqFold can be easily adapted to incorporate any new types of high-throughput RNA structure profiling data and is widely applicable to analyze RNA structures in any transcriptome.

    View details for DOI 10.1101/gr.138545.112

    View details for PubMedID 23064747

  • Tissue-specific direct targets of Caenorhabditis elegans Rb/E2F dictate distinct somatic and germline programs. Genome biology Kudron, M., Niu, W., Lu, Z., Wang, G., Gerstein, M., Snyder, M., Reinke, V. 2013; 14 (1): R5

    Abstract

    BACKGROUND: The tumor suppressor Rb/E2F regulates gene expression to control differentiation in multiple tissues during development, although how it directs tissue-specific gene regulation in vivo is poorly understood. RESULTS: We determined the genome-wide binding profiles for Caenorhabditis elegans Rb/E2F-like components in the germline, in the intestine and broadly throughout the soma, and uncovered highly tissue-specific binding patterns and target genes. Chromatin association by LIN-35, the C. elegans ortholog of Rb, is impaired in the germline but robust in the soma, a characteristic that might govern differential effects on gene expression in the two cell types. In the intestine, LIN-35 and the heterochromatin protein HPL-2, the ortholog of Hp1, coordinately bind at many sites lacking E2F. Finally, selected direct target genes contribute to the soma-to-germline transformation of lin-35 mutants, including mes-4, a soma-specific target that promotes H3K36 methylation, and csr-1, a germline-specific target that functions in a 22G small RNA pathway. CONCLUSIONS: In sum, identification of tissue-specific binding profiles and effector target genes reveals important insights into the mechanisms by which Rb/E2F controls distinct cell fates in vivo.

    View details for DOI 10.1186/gb-2013-14-1-r5

    View details for PubMedID 23347407

  • Two methods for full-length RNA sequencing for low quantities of cells and single cells PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Pan, X., Durrett, R. E., Zhu, H., Tanaka, Y., Li, Y., Zi, X., Marjani, S. L., Euskirchen, G., Ma, C., LaMotte, R. H., Park, I., Snyder, M. P., Mason, C. E., Weissman, S. M. 2013; 110 (2): 594-599

    Abstract

    The ability to determine the gene expression pattern in low quantities of cells or single cells is important for resolving a variety of problems in many biological disciplines. A robust description of the expression signature of a single cell requires determination of the full-length sequence of the expressed mRNAs in the cell, yet existing methods have either 3' biased or variable transcript representation. Here, we report our protocols for the amplification and high-throughput sequencing of very small amounts of RNA for sequencing using procedures of either semirandom primed PCR or phi29 DNA polymerase-based DNA amplification, for the cDNA generated with oligo-dT and/or random oligonucleotide primers. Unlike existing methods, these protocols produce relatively uniformly distributed sequences covering the full length of almost all transcripts independent of their sizes, from 1,000 to 10 cells, and even with single cells. Both protocols produced satisfactory detection/coverage of the abundant mRNAs from a single K562 erythroleukemic cell or a single dorsal root ganglion neuron. The phi29-based method produces long products with less noise, uses an isothermal reaction, and is simple to practice. The semirandom primed PCR procedure is more sensitive and reproducible at low transcript levels or with low quantities of cells. These methods provide tools for mRNA sequencing or RNA sequencing when only low quantities of cells, a single cell, or even degraded RNA are available for profiling.

    View details for DOI 10.1073/pnas.1217322109

    View details for Web of Science ID 000313906600047

    View details for PubMedID 23267071

    View details for PubMedCentralID PMC3545756

  • Multimodal Dynamic Profiling of Healthy and Diseased States for Future Personalized Health Care CLINICAL PHARMACOLOGY & THERAPEUTICS Mias, G. I., Snyder, M. 2013; 93 (1): 29-32

    View details for DOI 10.1038/clpt.2012.204

    View details for Web of Science ID 000312618200021

    View details for PubMedID 23187877

  • Integrative analysis of longitudinal metabolomics data from a personal multi-omics profile. Metabolites Stanberry, L., Mias, G. I., Haynes, W., Higdon, R., Snyder, M., Kolker, E. 2013; 3 (3): 741-760

    Abstract

    The integrative personal omics profile (iPOP) is a pioneering study that combines genomics, transcriptomics, proteomics, metabolomics and autoantibody profiles from a single individual over a 14-month period. The observation period includes two episodes of viral infection: a human rhinovirus and a respiratory syncytial virus. The profile studies give an informative snapshot into the biological functioning of an organism. We hypothesize that pathway expression levels are associated with disease status. To test this hypothesis, we use biological pathways to integrate metabolomics and proteomics iPOP data. The approach computes the pathways' differential expression levels at each time point, while taking into account the pathway structure and the longitudinal design. The resulting pathway levels show strong association with the disease status. Further, we identify temporal patterns in metabolite expression levels. The changes in metabolite expression levels also appear to be consistent with the disease status. The results of the integrative analysis suggest that changes in biological pathways may be used to predict and monitor the disease. The iPOP experimental design, data acquisition and analysis issues are discussed within the broader context of personal profiling.

    View details for DOI 10.3390/metabo3030741

    View details for PubMedID 24958148

    View details for PubMedCentralID PMC3901289

  • Specific plasma autoantibody reactivity in myelodysplastic syndromes. Scientific reports Mias, G. I., Chen, R., Zhang, Y., Sridhar, K., Sharon, D., Xiao, L., Im, H., Snyder, M. P., Greenberg, P. L. 2013; 3: 3311-?

    View details for DOI 10.1038/srep03311

    View details for PubMedID 24264604

  • Tissue-specific direct targets of Caenorhabditis elegans Rb/E2F dictate distinct somatic and germline programs GENOME BIOLOGY Kudron, M., Niu, W., Lu, Z., Wang, G., Gerstein, M., Snyder, M., Reinke, V. 2013; 14 (1)

    Abstract

    BACKGROUND: The tumor suppressor Rb/E2F regulates gene expression to control differentiation in multiple tissues during development, although how it directs tissue-specific gene regulation in vivo is poorly understood. RESULTS: We determined the genome-wide binding profiles for Caenorhabditis elegans Rb/E2F-like components in the germline, in the intestine and broadly throughout the soma, and uncovered highly tissue-specific binding patterns and target genes. Chromatin association by LIN-35, the C. elegans ortholog of Rb, is impaired in the germline but robust in the soma, a characteristic that might govern differential effects on gene expression in the two cell types. In the intestine, LIN-35 and the heterochromatin protein HPL-2, the ortholog of Hp1, coordinately bind at many sites lacking E2F. Finally, selected direct target genes contribute to the soma-to-germline transformation of lin-35 mutants, including mes-4, a soma-specific target that promotes H3K36 methylation, and csr-1, a germline-specific target that functions in a 22G small RNA pathway. CONCLUSIONS: In sum, identification of tissue-specific binding profiles and effector target genes reveals important insights into the mechanisms by which Rb/E2F controls distinct cell fates in vivo.

    View details for DOI 10.1186/gb-2013-14-1-r5

    View details for Web of Science ID 000320155200005

  • High-throughput sequencing for biology and medicine MOLECULAR SYSTEMS BIOLOGY Soon, W. W., Hariharan, M., Snyder, M. P. 2013; 9

    Abstract

    Advances in genome sequencing have progressed at a rapid pace, with increased throughput accompanied by plunging costs. But these advances go far beyond faster and cheaper. High-throughput sequencing technologies are now routinely being applied to a wide range of important topics in biology and medicine, often allowing researchers to address important biological questions that were not possible before. In this review, we discuss these innovative new approaches-including ever finer analyses of transcriptome dynamics, genome structure and genomic variation-and provide an overview of the new insights into complex biological systems catalyzed by these technologies. We also assess the impact of genotyping, genome sequencing and personal omics profiling on medical applications, including diagnosis and disease monitoring. Finally, we review recent developments in single-cell sequencing, and conclude with a discussion of possible future advances and obstacles for sequencing in biology and health.

    View details for DOI 10.1038/msb.2012.61

    View details for Web of Science ID 000314415800010

    View details for PubMedID 23340846

    View details for PubMedCentralID PMC3564260

  • Systematic investigation of protein-small molecule interactions IUBMB LIFE Li, X., Wang, X., Snyder, M. 2013; 65 (1): 2-8

    Abstract

    Cell signaling is extensively wired between cellular components to sustain cell proliferation, differentiation, and adaptation. The interaction network is often manifested in how protein function is regulated through interacting with other cellular components including small molecule metabolites. While many biochemical interactions have been established as reactions between protein enzymes and their substrates and products, much less is known at the system level about how small metabolites regulate protein functions through allosteric binding. In the past decade, study of protein-small molecule interactions has been lagging behind other types of interactions. Recent technological advances have explored several high-throughput platforms to reveal many "unexpected" protein-small molecule interactions that could have profound impact on our understanding of cell signaling. These interactions will help bridge gaps in existing regulatory loops of cell signaling and serve as new targets for medical intervention. In this review, we summarize recent advances of systematic investigation of protein-metabolite/small molecule interactions, and discuss the impact of such studies and their potential impact on both biological researches and medicine.

    View details for DOI 10.1002/iub.1111

    View details for Web of Science ID 000312886200002

    View details for PubMedID 23225626

  • A Chromosome-centric Human Proteome Project (C-HPP) to Characterize the Sets of Proteins Encoded in Chromosome 17 JOURNAL OF PROTEOME RESEARCH Liu, S., Im, H., Bairoch, A., Cristofanilli, M., Chen, R., Deutsch, E. W., Dalton, S., Fenyo, D., Fanayan, S., Gates, C., Gaudet, P., Hincapie, M., Hanash, S., Kim, H., Jeong, S., Lundberg, E., Mias, G., Menon, R., Mu, Z., Nice, E., Paik, Y., Uhlen, M., Wells, L., Wu, S., Yan, F., Zhang, F., Zhang, Y., Snyder, M., Omenn, G. S., Beavis, R. C., Hancock, W. S. 2013; 12 (1): 45-57

    Abstract

    We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.

    View details for DOI 10.1021/pr300985j

    View details for Web of Science ID 000313156300007

    View details for PubMedID 23259914

  • Exome sequencing by targeted enrichment. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Clark, M. J., Chen, R., Snyder, M. 2013; Chapter 7: Unit7 12-?

    Abstract

    This unit describes methods for targeted enrichment of the exon-coding portions of the genome using Agilent SureSelect Human All Exon 50 Mb and Roche Nimblegen SeqCap EZ Exome platforms. Each platform targets and enriches a large overlapping portion of the greater human exome. The protocols here describe the biochemical procedures used to enrich exomic DNA with each platform, including recommended modifications to the manufacturers' protocols. In addition, a brief description of the sequencing protocol and estimation of the needed amount of sequencing for each platform is included. Finally, a detailed analytical pipeline for processing the subsequent data is described. These protocols focus specifically on human exome sequencing platforms, but can be applied with some modification to other organisms and targeted enrichment approaches.

    View details for DOI 10.1002/0471142727.mb0712s102

    View details for PubMedID 23547016

  • The variable somatic genome. Cell cycle O'Huallachain, M., Weissman, S. M., Snyder, M. P. 2013; 12 (1): 5-6

    View details for DOI 10.4161/cc.23069

    View details for PubMedID 23255102

    View details for PubMedCentralID PMC3570516

  • Promise of personalized omics to precision medicine WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE Chen, R., Snyder, M. 2013; 5 (1): 73-82

    Abstract

    The rapid development of high-throughput technologies and computational frameworks enables the examination of biological systems in unprecedented detail. The ability to study biological phenomena at omics levels in turn is expected to lead to significant advances in personalized and precision medicine. Patients can be treated according to their own molecular characteristics. Individual omes as well as the integrated profiles of multiple omes, such as the genome, the epigenome, the transcriptome, the proteome, the metabolome, the antibodyome, and other omics information are expected to be valuable for health monitoring, preventative measures, and precision medicine. Moreover, omics technologies have the potential to transform medicine from traditional symptom-oriented diagnosis and treatment of diseases toward disease prevention and early diagnostics. We discuss here the advances and challenges in systems biology-powered personalized medicine at its current stage, as well as a prospective view of future personalized health care at the end of this review.

    View details for DOI 10.1002/wsbm.1198

    View details for Web of Science ID 000312736200005

    View details for PubMedID 23184638

  • Centromere-Like Regions in the Budding Yeast Genome PLOS GENETICS Lefrancois, P., Auerbach, R. K., Yellman, C. M., Roeder, G. S., Snyder, M. 2013; 9 (1)

    Abstract

    Accurate chromosome segregation requires centromeres (CENs), the DNA sequences where kinetochores form, to attach chromosomes to microtubules. In contrast to most eukaryotes, which have broad centromeres, Saccharomyces cerevisiae possesses sequence-defined point CENs. Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) reveals colocalization of four kinetochore proteins at novel, discrete, non-centromeric regions, especially when levels of the centromeric histone H3 variant, Cse4 (a.k.a. CENP-A or CenH3), are elevated. These regions of overlapping protein binding enhance the segregation of plasmids and chromosomes and have thus been termed Centromere-Like Regions (CLRs). CLRs form in close proximity to S. cerevisiae CENs and share characteristics typical of both point and regional CENs. CLR sequences are conserved among related budding yeasts. Many genomic features characteristic of CLRs are also associated with these conserved homologous sequences from closely related budding yeasts. These studies provide general and important insights into the origin and evolution of centromeres.

    View details for DOI 10.1371/journal.pgen.1003209

    View details for Web of Science ID 000314651500052

    View details for PubMedID 23349633

    View details for PubMedCentralID PMC3547844

  • Copy Number Variation detection from 1000 Genomes project exon capture sequencing data BMC BIOINFORMATICS Wu, J., Grzeda, K. R., Stewart, C., Grubert, F., Urban, A. E., Snyder, M. P., Marth, G. T. 2012; 13

    Abstract

    DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function.As members of the 1000 Genomes Project analysis effort, we investigated 697 samples in which 931 genes were targeted and sampled with 454 or Illumina paired-end sequencing. We developed a rigorous Bayesian method to detect CNVs in the genes, based on read depth within target regions. Despite substantial variability in read coverage across samples and targeted exons, we were able to identify 107 heterozygous deletions in the dataset. The experimentally determined false discovery rate (FDR) of the cleanest dataset from the Wellcome Trust Sanger Institute is 12.5%. We were able to substantially improve the FDR in a subset of gene deletion candidates that were adjacent to another gene deletion call (17 calls). The estimated sensitivity of our call-set was 45%.This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions. Based on the number of events we found and the sensitivity of the methods in the present dataset, we estimate on average 16 genic heterozygous deletions per individual genome. Our power analysis informs ongoing and future projects about sequencing depth and uniformity of read coverage required for efficient detection.

    View details for DOI 10.1186/1471-2105-13-305

    View details for Web of Science ID 000314688600001

    View details for PubMedID 23157288

    View details for PubMedCentralID PMC3563612

  • Whole Genome Sequence Analysis of Primary Myelofibrosis. 54th Annual Meeting and Exposition of the American-Society-of-Hematology (ASH) Merker, J. D., Roskin, K., Ng, D., Pan, C., Fisk, D. G., Jones, C. D., Gojenola, L., Clark, M. J., Zhang, B., Cherry, M., Snyder, M., Boyd, S. D., Zehnder, J. L., Fire, A. Z., Gotlib, J. AMER SOC HEMATOLOGY. 2012
  • Genome interpretation and assembly-recent progress and next steps. Nature biotechnology Baker, S., Joecker, A., Church, G., Snyder, M., West, J., Salzberg, S., Worthey, E., Smith, T., Wang, J., Reid, J. G. 2012; 30 (11): 1081-1083

    View details for DOI 10.1038/nbt.2425

    View details for PubMedID 23138307

  • Michael Snyder. Interview by Asher Mullard. Nature reviews. Drug discovery Snyder, M. 2012; 11 (10): 744-?

    View details for DOI 10.1038/nrd3867

    View details for PubMedID 23023673

  • Systems biology: personalized medicine for the future? CURRENT OPINION IN PHARMACOLOGY Chen, R., Snyder, M. 2012; 12 (5): 623-628

    Abstract

    Systems biology is actively transforming the field of modern health care from symptom-based disease diagnosis and treatment to precision medicine in which patients are treated based on their individual characteristics. Development of high-throughput technologies such as high-throughout sequencing and mass spectrometry has enabled scientists and clinicians to examine genomes, transcriptomes, proteomes, metabolomes, and other omics information in unprecedented detail. The combined 'omics' information leads to a global profiling of health and disease, and provides new approaches for personalized health monitoring and preventative medicine. In this article, we review the efforts of systems biology in personalized medicine in the past 2 years, and discuss in detail achievements and concerns, as well as highlights and hurdles for future personalized health care.

    View details for DOI 10.1016/j.coph.2012.07.011

    View details for Web of Science ID 000310478800017

    View details for PubMedID 22858243

  • SWI/SNF Chromatin-remodeling Factors: Multiscale Analyses and Diverse Functions JOURNAL OF BIOLOGICAL CHEMISTRY Euskirchen, G., Auerbach, R. K., Snyder, M. 2012; 287 (37): 30897-30905

    Abstract

    Chromatin-remodeling enzymes play essential roles in many biological processes, including gene expression, DNA replication and repair, and cell division. Although one such complex, SWI/SNF, has been extensively studied, new discoveries are still being made. Here, we review SWI/SNF biochemistry; highlight recent genomic and proteomic advances; and address the role of SWI/SNF in human diseases, including cancer and viral infections. These studies have greatly increased our understanding of complex nuclear processes.

    View details for DOI 10.1074/jbc.R111.309302

    View details for Web of Science ID 000308791300003

    View details for PubMedID 22952240

    View details for PubMedCentralID PMC3438922

  • Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements GENOME RESEARCH Kundaje, A., Kyriazopoulou-Panagiotopoulou, S., Libbrecht, M., Smith, C. L., Raha, D., Winters, E. E., Johnson, S. M., Snyder, M., Batzoglou, S., Sidow, A. 2012; 22 (9): 1735-1747

    Abstract

    Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.

    View details for DOI 10.1101/gr.136366.111

    View details for PubMedID 22955985

  • A highly integrated and complex PPARGC1A transcription factor binding network in HepG2 cells GENOME RESEARCH Charos, A. E., Reed, B. D., Raha, D., Szekely, A. M., Weissman, S. M., Snyder, M. 2012; 22 (9): 1668-1679

    Abstract

    PPARGC1A is a transcriptional coactivator that binds to and coactivates a variety of transcription factors (TFs) to regulate the expression of target genes. PPARGC1A plays a pivotal role in regulating energy metabolism and has been implicated in several human diseases, most notably type II diabetes. Previous studies have focused on the interplay between PPARGC1A and individual TFs, but little is known about how PPARGC1A combines with all of its partners across the genome to regulate transcriptional dynamics. In this study, we describe a core PPARGC1A transcriptional regulatory network operating in HepG2 cells treated with forskolin. We first mapped the genome-wide binding sites of PPARGC1A using chromatin-IP followed by high-throughput sequencing (ChIP-seq) and uncovered overrepresented DNA sequence motifs corresponding to known and novel PPARGC1A network partners. We then profiled six of these site-specific TF partners using ChIP-seq and examined their network connectivity and combinatorial binding patterns with PPARGC1A. Our analysis revealed extensive overlap of targets including a novel link between PPARGC1A and HSF1, a TF regulating the conserved heat shock response pathway that is misregulated in diabetes. Importantly, we found that different combinations of TFs bound to distinct functional sets of genes, thereby helping to reveal the combinatorial regulatory code for metabolic and other cellular processes. In addition, the different TFs often bound near the promoters and coding regions of each other's genes suggesting an intricate network of interdependent regulation. Overall, our study provides an important framework for understanding the systems-level control of metabolic gene expression in humans.

    View details for DOI 10.1101/gr.127761.111

    View details for Web of Science ID 000308272800009

    View details for PubMedID 22955979

    View details for PubMedCentralID PMC3431484

  • Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for IncRNAs GENOME RESEARCH Tilgner, H., Knowles, D. G., Johnson, R., Davis, C. A., Chakrabortty, S., Djebali, S., Curado, J., Snyder, M., Gingeras, T. R., Guigo, R. 2012; 22 (9): 1616-1625

    Abstract

    Splicing remains an incompletely understood process. Recent findings suggest that chromatin structure participates in its regulation. Here, we analyze the RNA from subcellular fractions obtained through RNA-seq in the cell line K562. We show that in the human genome, splicing occurs predominantly during transcription. We introduce the coSI measure, based on RNA-seq reads mapping to exon junctions and borders, to assess the degree of splicing completion around internal exons. We show that, as expected, splicing is almost fully completed in cytosolic polyA+ RNA. In chromatin-associated RNA (which includes the RNA that is being transcribed), for 5.6% of exons, the removal of the surrounding introns is fully completed, compared with 0.3% of exons for which no intron-removal has occurred. The remaining exons exist as a mixture of spliced and fewer unspliced molecules, with a median coSI of 0.75. Thus, most RNAs undergo splicing while being transcribed: "co-transcriptional splicing." Consistent with co-transcriptional spliceosome assembly and splicing, we have found significant enrichment of spliceosomal snRNAs in chromatin-associated RNA compared with other cellular RNA fractions and other nonspliceosomal snRNAs. CoSI scores decrease along the gene, pointing to a "first transcribed, first spliced" rule, yet more downstream exons carry other characteristics, favoring rapid, co-transcriptional intron removal. Exons with low coSI values, that is, in the process of being spliced, are enriched with chromatin marks, consistent with a role for chromatin in splicing during transcription. For alternative exons and long noncoding RNAs, splicing tends to occur later, and the latter might remain unspliced in some cases.

    View details for DOI 10.1101/gr.134445.111

    View details for Web of Science ID 000308272800004

    View details for PubMedID 22955974

    View details for PubMedCentralID PMC3431479

  • VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment BIOINFORMATICS Habegger, L., Balasubramanian, S., Chen, D. Z., Khurana, E., Sboner, A., Harmanci, A., Rozowsky, J., Clarke, D., Snyder, M., Gerstein, M. 2012; 28 (17): 2267-2269

    Abstract

    The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment.VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.

    View details for DOI 10.1093/bioinformatics/bts368

    View details for Web of Science ID 000308019200008

    View details for PubMedID 22743228

    View details for PubMedCentralID PMC3426844

  • Understanding transcriptional regulation by integrative analysis of transcription factor binding data GENOME RESEARCH Cheng, C., Alexander, R., Min, R., Leng, J., Yip, K. Y., Rozowsky, J., Yan, K., Dong, X., Djebali, S., Ruan, Y., Davis, C. A., Carninci, P., Lassman, T., Gingerasi, T. R., Guigo, R., Birney, E., Weng, Z., Snyder, M., Gerstein, M. 2012; 22 (9): 1658-1667

    Abstract

    Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.

    View details for DOI 10.1101/gr.136838.111

    View details for Web of Science ID 000308272800008

    View details for PubMedID 22955978

    View details for PubMedCentralID PMC3431483

  • Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors GENOME RESEARCH Wang, J., Zhuang, J., Iyer, S., Lin, X., Whitfield, T. W., Greven, M. C., Pierce, B. G., Dong, X., Kundaje, A., Cheng, Y., Rando, O. J., Birney, E., Myers, R. M., Noble, W. S., Snyder, M., Weng, Z. 2012; 22 (9): 1798-1812

    Abstract

    Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line-specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated.

    View details for DOI 10.1101/gr.139105.112

    View details for Web of Science ID 000308272800020

    View details for PubMedID 22955990

    View details for PubMedCentralID PMC3431495

  • A Genome-Scale Resource for In Vivo Tag-Based Protein Function Exploration in C. elegans CELL Sarov, M., Murray, J. I., Schanze, K., Pozniakovski, A., Niu, W., Angermann, K., Hasse, S., Rupprecht, M., Vinis, E., Tinney, M., Preston, E., Zinke, A., Enst, S., Teichgraber, T., Janette, J., Reis, K., Janosch, S., Schloissnig, S., Ejsmont, R. K., Slightam, C., Xu, X., Kim, S. K., Reinke, V., Stewart, A. F., Snyder, M., Waterston, R. H., Hyman, A. A. 2012; 150 (4): 855-866

    Abstract

    Understanding the in vivo dynamics of protein localization and their physical interactions is important for many problems in biology. To enable systematic protein function interrogation in a multicellular context, we built a genome-scale transgenic platform for in vivo expression of fluorescent- and affinity-tagged proteins in Caenorhabditis elegans under endogenous cis regulatory control. The platform combines computer-assisted transgene design, massively parallel DNA engineering, and next-generation sequencing to generate a resource of 14,637 genomic DNA transgenes, which covers 73% of the proteome. The multipurpose tag used allows any protein of interest to be localized in vivo or affinity purified using standard tag-based assays. We illustrate the utility of the resource by systematic chromatin immunopurification and automated 4D imaging, which produced detailed DNA binding and cell/tissue distribution maps for key transcription factor proteins.

    View details for DOI 10.1016/j.cell.2012.08.001

    View details for Web of Science ID 000308002300018

    View details for PubMedID 22901814

  • Discovery of Stress Responsive DNA Regulatory Motifs in Arabidopsis PLOS ONE Ma, S., Bachan, S., Porto, M., Bohnert, H. J., Snyder, M., Dinesh-Kumar, S. P. 2012; 7 (8)

    Abstract

    The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer--a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.

    View details for DOI 10.1371/journal.pone.0043198

    View details for Web of Science ID 000307500100069

    View details for PubMedID 22912824

    View details for PubMedCentralID PMC3418279

  • An encyclopedia of mouse DNA elements (Mouse ENCODE). Genome biology Stamatoyannopoulos, J. A., Snyder, M., Hardison, R., Ren, B., Gingeras, T., Gilbert, D. M., Groudine, M., Bender, M., Kaul, R., Canfield, T., Giste, E., Johnson, A., Zhang, M., Balasundaram, G., Byron, R., Roach, V., Sabo, P. J., Sandstrom, R., Stehling, A. S., Thurman, R. E., Weissman, S. M., Cayting, P., Hariharan, M., Lian, J., Cheng, Y., Landt, S. G., Ma, Z., Wold, B. J., Dekker, J., Crawford, G. E., Keller, C. A., Wu, W., Morrissey, C., Kumar, S. A., Mishra, T., Jain, D., Byrska-Bishop, M., Blankenberg, D., Lajoie1, B. R., Jain, G., Sanyal, A., Chen, K. B., Denas, O., Taylor, J., Blobel, G. A., Weiss, M. J., Pimkin, M., Deng, W., Marinov, G. K., Williams, B. A., Fisher-Aylor, K. I., Desalvo, G., Kiralusha, A., Trout, D., Amrhein, H., Mortazavi, A., Edsall, L., McCleary, D., Kuan, S., Shen, Y., Yue, F., Ye, Z., Davis, C. A., Zaleski, C., Jha, S., Xue, C., Dobin, A., Lin, W., Fastuca, M., Wang, H., Guigo, R., Djebali, S., Lagarde, J., Ryba, T., Sasaki, T., Malladi, V. S., Cline, M. S., Kirkup, V. M., Learned, K., Rosenbloom, K. R., Kent, W. J., Feingold, E. A., Good, P. J., Pazin, M., Lowdon, R. F., Adams, L. B. 2012; 13 (8): 418

    Abstract

    ABSTRACT: To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.

    View details for DOI 10.1186/gb-2012-13-8-418

    View details for PubMedID 22889292

    View details for PubMedCentralID PMC3491367

  • Investigating metabolite-protein interactions: An overview of available techniques METHODS Yang, G. X., Li, X., Snyder, M. 2012; 57 (4): 459-466

    Abstract

    Metabolites comprise the molar majority of chemical substances in living cells, and metabolite-protein interactions are expected to be quite common. Many interactions have already been identified and have been shown to be involved in the regulation of different types of cellular processes including signaling events, enzyme activities, protein localizations and interactions. Recent technological advances have greatly facilitated the detection of metabolite-protein interactions at high sensitivity and some of these have been applied on a large scale. In this manuscript, we review the available in vitro, in silico and in vivo technologies for mapping small-molecule-protein interactions. Although some of these were developed for drug-protein interactions they can be applied for mapping metabolite-protein interactions. Information gained from the use of these approaches can be applied to the manipulation of cellular processes and therapeutic applications.

    View details for DOI 10.1016/j.ymeth.2012.06.013

    View details for Web of Science ID 000309625600009

    View details for PubMedID 22750303

    View details for PubMedCentralID PMC3448827

  • Patient-Specific Induced Pluripotent Stem Cells as a Model for Familial Dilated Cardiomyopathy SCIENCE TRANSLATIONAL MEDICINE Sun, N., Yazawa, M., Liu, J., Han, L., Sanchez-Freire, V., Abilez, O. J., Navarrete, E. G., Hu, S., Wang, L., Lee, A., Pavlovic, A., Lin, S., Chen, R., Hajjar, R. J., Snyder, M. P., Dolmetsch, R. E., Butte, M. J., Ashley, E. A., Longaker, M. T., Robbins, R. C., Wu, J. C. 2012; 4 (130)

    Abstract

    Characterized by ventricular dilatation, systolic dysfunction, and progressive heart failure, dilated cardiomyopathy (DCM) is the most common form of cardiomyopathy in patients. DCM is the most common diagnosis leading to heart transplantation and places a significant burden on healthcare worldwide. The advent of induced pluripotent stem cells (iPSCs) offers an exceptional opportunity for creating disease-specific cellular models, investigating underlying mechanisms, and optimizing therapy. Here, we generated cardiomyocytes from iPSCs derived from patients in a DCM family carrying a point mutation (R173W) in the gene encoding sarcomeric protein cardiac troponin T. Compared to control healthy individuals in the same family cohort, cardiomyocytes derived from iPSCs from DCM patients exhibited altered regulation of calcium ion (Ca(2+)), decreased contractility, and abnormal distribution of sarcomeric α-actinin. When stimulated with a β-adrenergic agonist, DCM iPSC-derived cardiomyocytes showed characteristics of cellular stress such as reduced beating rates, compromised contraction, and a greater number of cells with abnormal sarcomeric α-actinin distribution. Treatment with β-adrenergic blockers or overexpression of sarcoplasmic reticulum Ca(2+) adenosine triphosphatase (Serca2a) improved the function of iPSC-derived cardiomyocytes from DCM patients. Thus, iPSC-derived cardiomyocytes from DCM patients recapitulate to some extent the morphological and functional phenotypes of DCM and may serve as a useful platform for exploring disease mechanisms and for drug screening.

    View details for DOI 10.1126/scitranslmed.3003552

    View details for Web of Science ID 000303045900004

    View details for PubMedID 22517884

    View details for PubMedCentralID PMC3657516

  • Extensive In vivo Metabolite-Protein Interactions Revealed by Large-Scale Systematic Analyses Experimental Biology Meeting 2012 Snyder, M., Li, X., Gianoulis, T., Yip, K., Gerstein, M. FEDERATION AMER SOC EXP BIOL. 2012
  • A core erythroid transcriptional network is repressed by a master regulator of myelo-lymphoid differentiation PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wontakal, S. N., Guo, X., Smith, C., MacCarthy, T., Bresnick, E. H., Bergman, A., Snyder, M. P., Weissman, S. M., Zheng, D., Skoultchi, A. I. 2012; 109 (10): 3832-3837

    Abstract

    Two mechanisms that play important roles in cell fate decisions are control of a "core transcriptional network" and repression of alternative transcriptional programs by antagonizing transcription factors. Whether these two mechanisms operate together is not known. Here we report that GATA-1, SCL, and Klf1 form an erythroid core transcriptional network by co-occupying >300 genes. Importantly, we find that PU.1, a negative regulator of terminal erythroid differentiation, is a highly integrated component of this network. GATA-1, SCL, and Klf1 act to promote, whereas PU.1 represses expression of many of the core network genes. PU.1 also represses the genes encoding GATA-1, SCL, Klf1, and important GATA-1 cofactors. Conversely, in addition to repressing PU.1 expression, GATA-1 also binds to and represses >100 PU.1 myelo-lymphoid gene targets in erythroid progenitors. Mathematical modeling further supports that this dual mechanism of repressing both the opposing upstream activator and its downstream targets provides a synergistic, robust mechanism for lineage specification. Taken together, these results amalgamate two key developmental principles, namely, regulation of a core transcriptional network and repression of an alternative transcriptional program, thereby enhancing our understanding of the mechanisms that establish cellular identity.

    View details for DOI 10.1073/pnas.1121019109

    View details for Web of Science ID 000301117700049

    View details for PubMedID 22357756

    View details for PubMedCentralID PMC3309740

  • Tcf7 Is an Important Regulator of the Switch of Self-Renewal and Differentiation in a Multipotential Hematopoietic Cell Line PLOS GENETICS Wu, J. Q., Seay, M., Schulz, V. P., Hariharan, M., Tuck, D., Lian, J., Du, J., Shi, M., Ye, Z., Gerstein, M., Snyder, M. P., Weissman, S. 2012; 8 (3)

    Abstract

    A critical problem in biology is understanding how cells choose between self-renewal and differentiation. To generate a comprehensive view of the mechanisms controlling early hematopoietic precursor self-renewal and differentiation, we used systems-based approaches and murine EML multipotential hematopoietic precursor cells as a primary model. EML cells give rise to a mixture of self-renewing Lin-SCA+CD34+ cells and partially differentiated non-renewing Lin-SCA-CD34- cells in a cell autonomous fashion. We identified and validated the HMG box protein TCF7 as a regulator in this self-renewal/differentiation switch that operates in the absence of autocrine Wnt signaling. We found that Tcf7 is the most down-regulated transcription factor when CD34+ cells switch into CD34- cells, using RNA-Seq. We subsequently identified the target genes bound by TCF7, using ChIP-Seq. We show that TCF7 and RUNX1 (AML1) bind to each other's promoter regions and that TCF7 is necessary for the production of the short isoforms, but not the long isoforms of RUNX1, suggesting that TCF7 and the short isoforms of RUNX1 function coordinately in regulation. Tcf7 knock-down experiments and Gene Set Enrichment Analyses suggest that TCF7 plays a dual role in promoting the expression of genes characteristic of self-renewing CD34+ cells while repressing genes activated in partially differentiated CD34- state. Finally a network of up-regulated transcription factors of CD34+ cells was constructed. Factors that control hematopoietic stem cell (HSC) establishment and development, cell growth, and multipotency were identified. These studies in EML cells demonstrate fundamental cell-intrinsic properties of the switch between self-renewal and differentiation, and yield valuable insights for manipulating HSCs and other differentiating systems.

    View details for DOI 10.1371/journal.pgen.1002565

    View details for Web of Science ID 000302254800041

    View details for PubMedID 22412390

    View details for PubMedCentralID PMC3297581

  • The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome NATURE BIOTECHNOLOGY Paik, Y., Jeong, S., Omenn, G. S., Uhlen, M., Hanash, S., Cho, S. Y., Lee, H., Na, K., Choi, E., Yan, F., Zhang, F., Zhang, Y., Snyder, M., Cheng, Y., Chen, R., Marko-Varga, G., Deutsch, E. W., Kim, H., Kwon, J., Aebersold, R., Bairoch, A., Taylor, A. D., Kim, K. Y., Lee, E., Hochstrasser, D., Legrain, P., Hancock, W. S. 2012; 30 (3): 221-223

    View details for Web of Science ID 000301303800011

    View details for PubMedID 22398612

  • Correlation of Global MicroRNA Expression With Basal Cell Carcinoma Subtype G3-GENES GENOMES GENETICS Heffelfinger, C., Ouyang, Z., Engberg, A., Leffell, D. J., Hanlon, A. M., Gordon, P. B., Zheng, W., Zhao, H., Snyder, M. P., Bale, A. E. 2012; 2 (2): 279-286

    Abstract

    Basal cell carcinomas (BCCs) are the most common cancers in the United States. The histologic appearance distinguishes several subtypes, each of which can have a different biologic behavior. In this study, global miRNA expression was quantified by high-throughput sequencing in nodular BCCs, a subtype that is slow growing, and infiltrative BCCs, aggressive tumors that extend through the dermis and invade structures such as cutaneous nerves. Principal components analysis correctly classified seven of eight infiltrative tumors on the basis of miRNA expression. The remaining tumor, on pathology review, contained a mixture of nodular and infiltrative elements. Nodular tumors did not cluster tightly, likely reflecting broader histopathologic diversity in this class, but trended toward forming a group separate from infiltrative BCCs. Quantitative polymerase chain reaction assays were developed for six of the miRNAs that showed significant differences between the BCC subtypes, and five of these six were validated in a replication set of four infiltrative and three nodular tumors. The expression level of miR-183, a miRNA that inhibits invasion and metastasis in several types of malignancies, was consistently lower in infiltrative than nodular tumors and could be one element underlying the difference in invasiveness. These results represent the first miRNA profiling study in BCCs and demonstrate that miRNA gene expression may be involved in tumor pathogenesis and particularly in determining the aggressiveness of these malignancies.

    View details for DOI 10.1534/g3.111.001115

    View details for Web of Science ID 000312411000015

    View details for PubMedID 22384406

    View details for PubMedCentralID PMC3284335

  • Characterization of Enhancer Function from Genome-Wide Analyses ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 13 Maston, G. A., Landt, S. G., Snyder, M., Green, M. R. 2012; 13: 29-57

    Abstract

    There has been a recent surge in the use of genome-wide methodologies to identify and annotate the transcriptional regulatory elements in the human genome. Here we review some of these methodologies and the conceptual insights about transcription regulation that have been gained from the use of genome-wide studies. It has become clear that the binding of transcription factors is itself a highly regulated process, and binding does not always appear to have functional consequences. Numerous properties have now been associated with regulatory elements that may be useful in their identification. Several aspects of enhancer function have been shown to be more widespread than was previously appreciated, including the highly combinatorial nature of transcription factor binding, the postinitiation regulation of many target genes, and the binding of enhancers at early stages to maintain their competence during development. Going forward, the integration of multiple genome-wide data sets should become a standard approach to elucidate higher-order regulatory interactions.

    View details for DOI 10.1146/annurev-genom-090711-163723

    View details for Web of Science ID 000310143800002

    View details for PubMedID 22703170

  • Deciphering DNA Sequence Information GENOME ORGANIZATION AND FUNCTION IN THE CELL NUCLEUS Kaganovich, M., Snyder, M., Rippe, K. 2012: 1–20
  • An encyclopedia of mouse DNA elements (Mouse ENCODE) GENOME BIOLOGY Stamatoyannopoulos, J. A., Snyder, M., Hardison, R., Ren, B., Gingeras, T., Gilbert, D. M., Groudine, M., Bender, M., Kaul, R., Canfield, T., Giste, E., Johnson, A., Zhang, M., Balasundaram, G., Byron, R., Roach, V., Sabo, P. J., Sandstrom, R., Stehling, A. S., Thurman, R. E., Weissman, S. M., Cayting, P., Hariharan, M., Lian, J., Cheng, Y., Landt, S. G., Ma, Z., Wold, B. J., Dekker, J., Crawford, G. E., Keller, C. A., Wu, W., Morrissey, C., Kumar, S. A., Mishra, T., Jain, D., Byrska-Bishop, M., Blankenberg, D., Lajoie, B. R., Jain, G., Sanyal, A., Chen, K., Denas, O., Taylor, J., Blobel, G. A., Weiss, M. J., Pimkin, M., Deng, W., Marinov, G. K., Williams, B. A., Fisher-Aylor, K. I., DeSalvo, G., Kiralusha, A., Trout, D., Amrhein, H., Mortazavi, A., Edsall, L., McCleary, D., Kuan, S., Shen, Y., Yue, F., Ye, Z., Davis, C. A., Zaleski, C., Jha, S., Xue, C., Dobin, A., Lin, W., Fastuca, M., Wang, H., Guigo, R., Djebali, S., Lagarde, J., Ryba, T., Sasaki, T., Malladi, V. S., Cline, M. S., Kirkup, V. M., Learned, K., Rosenbloom, K. R., Kent, W. J., Feingold, E. A., Good, P. J., Pazin, M., Lowdon, R. F., Adams, L. B. 2012; 13 (8)
  • Q & A: the Snyderome GENOME BIOLOGY Snyder, M. 2012; 13 (3)

    Abstract

    Michael Snyder answers Genome Biology's questions on the human and professional stories underlying his Snyderome integrative omics project.

    View details for DOI 10.1186/gb-2012-13-3-147

    View details for Web of Science ID 000308544200010

    View details for PubMedID 22424393

    View details for PubMedCentralID PMC3439959

  • Phosphorylation of Yeast Transcription Factors Correlates with the Evolution of Novel Sequence and Function JOURNAL OF PROTEOME RESEARCH Kaganovich, M., Snyder, M. 2012; 11 (1): 261-268

    Abstract

    Gene duplication is a significant source of novel genes and the dynamics of gene duplicate retention vs loss are poorly understood, particularly in terms of the functional and regulatory specialization of their gene products. We compiled a comprehensive data set of S. cerevisiae phosphosites to study the role of phosphorylation in yeast paralog divergence. We found that proteins coded by duplicated genes created in the Whole Genome Duplication (WGD) event and in a period prior to the WGD are significantly more phosphorylated than other duplicates or singletons. Though the amino acid sequence of each paralog of a given pair tends to diverge fairly similarly from their common ortholog in a related species, the phosphorylated amino acids tend to diverge in sequence from the ortholog at different rates. We observed that transcription factors (TFs) are disproportionately present among the set of duplicate genes and among the set of proteins that are phosphorylated. Interestingly, TFs that occur on higher levels of the transcription network hierarchy (i.e., tend to regulate other TFs) tend to be more phosphorylated than lower-level TFs. We found that TF paralog divergence in expression, binding, and sequence correlates with the abundance of phosphosites. Overall, these studies have important implications for understanding divergence of gene function and regulation in eukaryotes.

    View details for DOI 10.1021/pr201065k

    View details for Web of Science ID 000298827700024

    View details for PubMedID 22141333

  • Interpretome: a freely available, modular, and secure personal genome interpretation engine. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Karczewski, K. J., Tirrell, R. P., Cordero, P., Tatonetti, N. P., Dudley, J. T., Salari, K., Snyder, M., Altman, R. B., Kim, S. K. 2012: 339-350

    Abstract

    The decreasing cost of genotyping and genome sequencing has ushered in an era of genomic personalized medicine. More than 100,000 individuals have been genotyped by direct-to-consumer genetic testing services, which offer a glimpse into the interpretation and exploration of a personal genome. However, these interpretations, which require extensive manual curation, are subject to the preferences of the company and are not customizable by the individual. Academic institutions teaching personalized medicine, as well as genetic hobbyists, may prefer to customize their analysis and have full control over the content and method of interpretation. We present the Interpretome, a system for private genome interpretation, which contains all genotype information in client-side interpretation scripts, supported by server-side databases. We provide state-of-the-art analyses for teaching clinical implications of personal genomics, including disease risk assessment and pharmacogenomics. Additionally, we have implemented client-side algorithms for ancestry inference, demonstrating the power of these methods without excessive computation. Finally, the modular nature of the system allows for plugin capabilities for custom analyses. This system will allow for personal genome exploration without compromising privacy, facilitating hands-on courses in genomics and personalized medicine.

    View details for PubMedID 22174289

  • Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors GENOME BIOLOGY Yip, K. Y., Cheng, C., Bhardwaj, N., Brown, J. B., Leng, J., Kundaje, A., Rozowsky, J., Birney, E., Bickel, P., Snyder, M., Gerstein, M. 2012; 13 (9)

    Abstract

    Transcription factors function by binding different classes of regulatory elements. The Encyclopedia of DNA Elements (ENCODE) project has recently produced binding data for more than 100 transcription factors from about 500 ChIP-seq experiments in multiple cell types. While this large amount of data creates a valuable resource, it is nonetheless overwhelmingly complex and simultaneously incomplete since it covers only a small fraction of all human transcription factors.As part of the consortium effort in providing a concise abstraction of the data for facilitating various types of downstream analyses, we constructed statistical models that capture the genomic features of three paired types of regions by machine-learning methods: firstly, regions with active or inactive binding; secondly, those with extremely high or low degrees of co-binding, termed HOT and LOT regions; and finally, regulatory modules proximal or distal to genes. From the distal regulatory modules, we developed computational pipelines to identify potential enhancers, many of which were validated experimentally. We further associated the predicted enhancers with potential target transcripts and the transcription factors involved. For HOT regions, we found a significant fraction of transcription factor binding without clear sequence motifs and showed that this observation could be related to strong DNA accessibility of these regions.Overall, the three pairs of regions exhibit intricate differences in chromosomal locations, chromatin features, factors that bind them, and cell-type specificity. Our machine learning approach enables us to identify features potentially general to all transcription factors, including those not included in the data.

    View details for DOI 10.1186/gb-2012-13-9-r48

    View details for Web of Science ID 000313182600001

    View details for PubMedID 22950945

    View details for PubMedCentralID PMC3491392

  • A High-Resolution Whole-Genome Map of Key Chromatin Modifications in the Adult Drosophila melanogaster PLOS GENETICS Yin, H., Sweeney, S., Raha, D., Snyder, M., Lin, H. 2011; 7 (12)

    Abstract

    Epigenetic research has been focused on cell-type-specific regulation; less is known about common features of epigenetic programming shared by diverse cell types within an organism. Here, we report a modified method for chromatin immunoprecipitation and deep sequencing (ChIP-Seq) and its use to construct a high-resolution map of the Drosophila melanogaster key histone marks, heterochromatin protein 1a (HP1a) and RNA polymerase II (polII). These factors are mapped at 50-bp resolution genome-wide and at 5-bp resolution for regulatory sequences of genes, which reveals fundamental features of chromatin modification landscape shared by major adult Drosophila cell types: the enrichment of both heterochromatic and euchromatic marks in transposons and repetitive sequences, the accumulation of HP1a at transcription start sites with stalled polII, the signatures of histone code and polII level/position around the transcriptional start sites that predict both the mRNA level and functionality of genes, and the enrichment of elongating polII within exons at splicing junctions. These features, likely conserved among diverse epigenomes, reveal general strategies for chromatin modifications.

    View details for DOI 10.1371/journal.pgen.1002380

    View details for Web of Science ID 000299167900003

    View details for PubMedID 22194694

  • Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms PLOS ONE Haraksingh, R. R., Abyzov, A., Gerstein, M., Urban, A. E., Snyder, M. 2011; 6 (11)

    Abstract

    Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications.

    View details for DOI 10.1371/journal.pone.0027859

    View details for Web of Science ID 000298168100021

    View details for PubMedID 22140474

    View details for PubMedCentralID PMC3227574

  • Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data PLOS COMPUTATIONAL BIOLOGY Cheng, C., Yan, K., Hwang, W., Qian, J., Bhardwaj, N., Rozowsky, J., Lu, Z. J., Niu, W., Alves, P., Kato, M., Snyder, M., Gerstein, M. 2011; 7 (11)

    Abstract

    We present a network framework for analyzing multi-level regulation in higher eukaryotes based on systematic integration of various high-throughput datasets. The network, namely the integrated regulatory network, consists of three major types of regulation: TF→gene, TF→miRNA and miRNA→gene. We identified the target genes and target miRNAs for a set of TFs based on the ChIP-Seq binding profiles, the predicted targets of miRNAs using annotated 3'UTR sequences and conservation information. Making use of the system-wide RNA-Seq profiles, we classified transcription factors into positive and negative regulators and assigned a sign for each regulatory interaction. Other types of edges such as protein-protein interactions and potential intra-regulations between miRNAs based on the embedding of miRNAs in their host genes were further incorporated. We examined the topological structures of the network, including its hierarchical organization and motif enrichment. We found that transcription factors downstream of the hierarchy distinguish themselves by expressing more uniformly at various tissues, have more interacting partners, and are more likely to be essential. We found an over-representation of notable network motifs, including a FFL in which a miRNA cost-effectively shuts down a transcription factor and its target. We used data of C. elegans from the modENCODE project as a primary model to illustrate our framework, but further verified the results using other two data sets. As more and more genome-wide ChIP-Seq and RNA-Seq data becomes available in the near future, our methods of data integration have various potential applications.

    View details for DOI 10.1371/journal.pcbi.1002190

    View details for Web of Science ID 000297263700001

    View details for PubMedID 22125477

    View details for PubMedCentralID PMC3219617

  • Performance comparison of exome DNA sequencing technologies NATURE BIOTECHNOLOGY Clark, M. J., Chen, R., Lam, H. Y., Karczewski, K. J., Chen, R., Euskirchen, G., Butte, A. J., Snyder, M. 2011; 29 (10): 908-U206

    Abstract

    Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS.

    View details for DOI 10.1038/nbt.1975

    View details for Web of Science ID 000296273000017

    View details for PubMedID 21947028

  • Phased Whole-Genome Genetic Risk in a Family Quartet Using a Major Allele Reference Sequence PLOS GENETICS Dewey, F. E., Chen, R., Cordero, S. P., Ormond, K. E., Caleshu, C., Karczewski, K. J., Whirl-Carrillo, M., Wheeler, M. T., Dudley, J. T., Byrnes, J. K., Cornejo, O. E., Knowles, J. W., Woon, M., Sangkuhl, K., Gong, L., Thorn, C. F., Hebert, J. M., Capriotti, E., David, S. P., Pavlovic, A., West, A., Thakuria, J. V., Ball, M. P., Zaranek, A. W., Rehm, H. L., Church, G. M., West, J. S., Bustamante, C. D., Snyder, M., Altman, R. B., Klein, T. E., Butte, A. J., Ashley, E. A. 2011; 7 (9)

    Abstract

    Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.

    View details for DOI 10.1371/journal.pgen.1002280

    View details for PubMedID 21935354

  • Arabidopsis RTNLB1 and RTNLB2 Reticulon-Like Proteins Regulate Intracellular Trafficking and Activity of the FLS2 Immune Receptor PLANT CELL Lee, H. Y., Bowen, C. H., Popescu, G. V., Kang, H., Kato, N., Ma, S., Dinesh-Kumar, S., Snyder, M., Popescu, S. C. 2011; 23 (9): 3374-3391

    Abstract

    Receptors localized at the plasma membrane are critical for the recognition of pathogens. The molecular determinants that regulate receptor transport to the plasma membrane are poorly understood. In a screen for proteins that interact with the FLAGELIN-SENSITIVE2 (FLS2) receptor using Arabidopsis thaliana protein microarrays, we identified the reticulon-like protein RTNLB1. We showed that FLS2 interacts in vivo with both RTNLB1 and its homolog RTNLB2 and that a Ser-rich region in the N-terminal tail of RTNLB1 is critical for the interaction with FLS2. Transgenic plants that lack RTNLB1 and RTNLB2 (rtnlb1 rtnlb2) or overexpress RTNLB1 (RTNLB1ox) exhibit reduced activation of FLS2-dependent signaling and increased susceptibility to pathogens. In both rtnlb1 rtnlb2 and RTNLB1ox, FLS2 accumulation at the plasma membrane was significantly affected compared with the wild type. Transient overexpression of RTNLB1 led to FLS2 retention in the endoplasmic reticulum (ER) and affected FLS2 glycosylation but not FLS2 stability. Removal of the critical N-terminal Ser-rich region or either of the two Tyr-dependent sorting motifs from RTNLB1 causes partial reversion of the negative effects of excess RTNLB1 on FLS2 transport out of the ER and accumulation at the membrane. The results are consistent with a model whereby RTNLB1 and RTNLB2 regulate the transport of newly synthesized FLS2 to the plasma membrane.

    View details for DOI 10.1105/tpc.111.089656

    View details for Web of Science ID 000296739100025

    View details for PubMedID 21949153

    View details for PubMedCentralID PMC3203430

  • Cooperative transcription factor associations discovered using regulatory variation PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Karczewski, K. J., Tatonetti, N. P., Landt, S. G., Yang, X., Slifer, T., Altman, R. B., Snyder, M. 2011; 108 (32): 13353-13358

    Abstract

    Regulation of gene expression at the transcriptional level is achieved by complex interactions of transcription factors operating at their target genes. Dissecting the specific combination of factors that bind each target is a significant challenge. Here, we describe in detail the Allele Binding Cooperativity test, which uses variation in transcription factor binding among individuals to discover combinations of factors and their targets. We developed the ALPHABIT (a large-scale process to hunt for allele binding interacting transcription factors) pipeline, which includes statistical analysis of binding sites followed by experimental validation, and demonstrate that this method predicts transcription factors that associate with NFκB. Our method successfully identifies factors that have been known to work with NFκB (E2A, STAT1, IRF2), but whose global coassociation and sites of cooperative action were not known. In addition, we identify a unique coassociation (EBF1) that had not been reported previously. We present a general approach for discovering combinatorial models of regulation and advance our understanding of the genetic basis of variation in transcription factor binding.

    View details for DOI 10.1073/pnas.1103105108

    View details for Web of Science ID 000293691400076

    View details for PubMedID 21828005

    View details for PubMedCentralID PMC3156166

  • A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans PLOS GENETICS Stewart, C., Kural, D., Stroemberg, M. P., Walker, J. A., Konkel, M. K., Stuetz, A. M., Urban, A. E., Grubert, F., Lam, H. Y., Lee, W., Busby, M., Indap, A. R., Garrison, E., Huff, C., Xing, J., Snyder, M. P., Jorde, L. B., Batzer, M. A., Korbel, J. O., Marth, G. T. 2011; 7 (8)

    Abstract

    As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations.

    View details for DOI 10.1371/journal.pgen.1002236

    View details for Web of Science ID 000294297000031

    View details for PubMedID 21876680

    View details for PubMedCentralID PMC3158055

  • AlleleSeq: analysis of allele-specific expression and binding in a network framework MOLECULAR SYSTEMS BIOLOGY Rozowsky, J., Abyzov, A., Wang, J., Alves, P., Raha, D., Harmanci, A., Leng, J., Bjornson, R., Kong, Y., Kitabayashi, N., Bhardwaj, N., Rubin, M., Snyder, M., Gerstein, M. 2011; 7

    Abstract

    To study allele-specific expression (ASE) and binding (ASB), that is, differences between the maternally and paternally derived alleles, we have developed a computational pipeline (AlleleSeq). Our pipeline initially constructs a diploid personal genome sequence (and corresponding personalized gene annotation) using genomic sequence variants (SNPs, indels, and structural variants), and then identifies allele-specific events with significant differences in the number of mapped reads between maternal and paternal alleles. There are many technical challenges in the construction and alignment of reads to a personal diploid genome sequence that we address, for example, bias of reads mapping to the reference allele. We have applied AlleleSeq to variation data for NA12878 from the 1000 Genomes Project as well as matched, deeply sequenced RNA-Seq and ChIP-Seq data sets generated for this purpose. In addition to observing fairly widespread allele-specific behavior within individual functional genomic data sets (including results consistent with X-chromosome inactivation), we can study the interaction between ASE and ASB. Furthermore, we investigate the coordination between ASE and ASB from multiple transcription factors events using a regulatory network framework. Correlation analyses and network motifs show mostly coordinated ASB and ASE.

    View details for DOI 10.1038/msb.2011.54

    View details for Web of Science ID 000294537800003

    View details for PubMedID 21811232

    View details for PubMedCentralID PMC3208341

  • Identification of genomic indels and structural variations using split reads BMC GENOMICS Zhang, Z. D., Du, J., Lam, H., Abyzov, A., Urban, A. E., Snyder, M., Gerstein, M. 2011; 12

    Abstract

    Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection.We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs.Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful.

    View details for DOI 10.1186/1471-2164-12-375

    View details for Web of Science ID 000294205500001

    View details for PubMedID 21787423

    View details for PubMedCentralID PMC3161018

  • Metabolites as global regulators: A new view of protein regulation BIOESSAYS Li, X., Snyder, M. 2011; 33 (7): 485-489

    View details for DOI 10.1002/bies.201100026

    View details for Web of Science ID 000292710500002

    View details for PubMedID 21495048

  • The Human Proteome Project: Current State and Future Direction MOLECULAR & CELLULAR PROTEOMICS Legrain, P., Aebersold, R., Archakov, A., Bairoch, A., Bala, K., Beretta, L., Bergeron, J., Borchers, C. H., Corthals, G. L., Costello, C. E., Deutsch, E. W., Domon, B., Hancock, W., He, F., Hochstrasser, D., Marko-Varga, G., Salekdeh, G. H., Sechi, S., Snyder, M., Srivastava, S., Uhlen, M., Wu, C. H., Yamamoto, T., Paik, Y., Omenn, G. S. 2011; 10 (7)

    Abstract

    After the successful completion of the Human Genome Project, the Human Proteome Organization has recently officially launched a global Human Proteome Project (HPP), which is designed to map the entire human protein set. Given the lack of protein-level evidence for about 30% of the estimated 20,300 protein-coding genes, a systematic global effort will be necessary to achieve this goal with respect to protein abundance, distribution, subcellular localization, interaction with other biomolecules, and functions at specific time points. As a general experimental strategy, HPP research groups will use the three working pillars for HPP: mass spectrometry, antibody capture, and bioinformatics tools and knowledge bases. The HPP participants will take advantage of the output and cross-analyses from the ongoing Human Proteome Organization initiatives and a chromosome-centric protein mapping strategy, termed C-HPP, with which many national teams are currently engaged. In addition, numerous biologically driven and disease-oriented projects will be stimulated and facilitated by the HPP. Timely planning with proper governance of HPP will deliver a protein parts list, reagents, and tools for protein studies and analyses, and a stronger basis for personalized medicine. The Human Proteome Organization urges each national research funding agency and the scientific community at large to identify their preferred pathways to participate in aspects of this highly promising project in a HPP consortium of funders and investigators.

    View details for DOI 10.1074/mcp.M111.009993

    View details for Web of Science ID 000292541500012

    View details for PubMedID 21742803

    View details for PubMedCentralID PMC3134076

  • Landscape of Next-Generation Sequencing Technologies ANALYTICAL CHEMISTRY Niedringhaus, T. P., Milanova, D., Kerby, M. B., Snyder, M. P., Barron, A. E. 2011; 83 (12): 4327-4341

    View details for DOI 10.1021/ac2010857

    View details for Web of Science ID 000291499800001

    View details for PubMedID 21612267

    View details for PubMedCentralID PMC3437308

  • A Large Gene Network in Immature Erythroid Cells Is Controlled by the Myeloid and B Cell Transcriptional Regulator PU.1 PLOS GENETICS Wontakal, S. N., Guo, X., Will, B., Shi, M., Raha, D., Mahajan, M. C., Weissman, S., Snyder, M., Steidl, U., Zheng, D., Skoultchi, A. I. 2011; 7 (6)

    Abstract

    PU.1 is a hematopoietic transcription factor that is required for the development of myeloid and B cells. PU.1 is also expressed in erythroid progenitors, where it blocks erythroid differentiation by binding to and inhibiting the main erythroid promoting factor, GATA-1. However, other mechanisms by which PU.1 affects the fate of erythroid progenitors have not been thoroughly explored. Here, we used ChIP-Seq analysis for PU.1 and gene expression profiling in erythroid cells to show that PU.1 regulates an extensive network of genes that constitute major pathways for controlling growth and survival of immature erythroid cells. By analyzing fetal liver erythroid progenitors from mice with low PU.1 expression, we also show that the earliest erythroid committed cells are dramatically reduced in vivo. Furthermore, we find that PU.1 also regulates many of the same genes and pathways in other blood cells, leading us to propose that PU.1 is a multifaceted factor with overlapping, as well as distinct, functions in several hematopoietic lineages.

    View details for DOI 10.1371/journal.pgen.1001392

    View details for Web of Science ID 000292386300004

    View details for PubMedID 21695229

    View details for PubMedCentralID PMC3111485

  • CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing GENOME RESEARCH Abyzov, A., Urban, A. E., Snyder, M., Gerstein, M. 2011; 21 (6): 974-984

    Abstract

    Copy number variation (CNV) in the genome is a complex phenomenon, and not completely understood. We have developed a method, CNVnator, for CNV discovery and genotyping from read-depth (RD) analysis of personal genome sequencing. Our method is based on combining the established mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) to broaden the range of discovered CNVs. We calibrated CNVnator using the extensive validation performed by the 1000 Genomes Project. Because of this, we could use CNVnator for CNV discovery and genotyping in a population and characterization of atypical CNVs, such as de novo and multi-allelic events. Overall, for CNVs accessible by RD, CNVnator has high sensitivity (86%-96%), low false-discovery rate (3%-20%), high genotyping accuracy (93%-95%), and high resolution in breakpoint discovery (<200 bp in 90% of cases with high sequencing coverage). Furthermore, CNVnator is complementary in a straightforward way to split-read and read-pair approaches: It misses CNVs created by retrotransposable elements, but more than half of the validated CNVs that it identifies are not detected by split-read or read-pair. By genotyping CNVs in the CEPH, Yoruba, and Chinese-Japanese populations, we estimated that at least 11% of all CNV loci involve complex, multi-allelic events, a considerably higher estimate than reported earlier. Moreover, among these events, we observed cases with allele distribution strongly deviating from Hardy-Weinberg equilibrium, possibly implying selection on certain complex loci. Finally, by combining discovery and genotyping, we identified six potential de novo CNVs in two family trios.

    View details for DOI 10.1101/gr.114876.110

    View details for Web of Science ID 000291153400017

    View details for PubMedID 21324876

    View details for PubMedCentralID PMC3106330

  • Genome-wide chromatin occupancy analysis reveals a role for ASH2 in transcriptional pausing NUCLEIC ACIDS RESEARCH Perez-Lluch, S., Blanco, E., Carbonell, A., Raha, D., Snyder, M., Serras, F., Corominas, M. 2011; 39 (11): 4628-4639

    Abstract

    An important mechanism for gene regulation involves chromatin changes via histone modification. One such modification is histone H3 lysine 4 trimethylation (H3K4me3), which requires histone methyltranferase complexes (HMT) containing the trithorax-group (trxG) protein ASH2. Mutations in ash2 cause a variety of pattern formation defects in the Drosophila wing. We have identified genome-wide binding of ASH2 in wing imaginal discs using chromatin immunoprecipitation combined with sequencing (ChIP-Seq). Our results show that genes with functions in development and transcriptional regulation are activated by ASH2 via H3K4 trimethylation in nearby nucleosomes. We have characterized the occupancy of phosphorylated forms of RNA Polymerase II and histone marks associated with activation and repression of transcription. ASH2 occupancy correlates with phosphorylated forms of RNA Polymerase II and histone activating marks in expressed genes. Additionally, RNA Polymerase II phosphorylation on serine 5 and H3K4me3 are reduced in ash2 mutants in comparison to wild-type flies. Finally, we have identified specific motifs associated with ASH2 binding in genes that are differentially expressed in ash2 mutants. Our data suggest that recruitment of the ASH2-containing HMT complexes is context specific and points to a function of ASH2 and H3K4me3 in transcriptional pausing control.

    View details for DOI 10.1093/nar/gkq1322

    View details for Web of Science ID 000291755000015

    View details for PubMedID 21310711

    View details for PubMedCentralID PMC3113561

  • The human proteome project: Current state and future direction. Molecular & cellular proteomics : MCP Legrain, P., Aebersold, R., Archakov, A., Bairoch, A., Bala, K., Beretta, L., Bergeron, J., Borchers, C., Corthals, G. L., Costello, C. E., Deutsch, E. W., Domon, B., Hancock, W., He, F., Hochstrasser, D., Marko-Varga, G., Salekdeh, G. H., Sechi, S., Snyder, M., Srivastava, S., Uhlen, M., Hu, C. H., Yamamoto, T., Paik, Y. K., Omenn, G. S. 2011

    Abstract

    After successful completion of the Human Genome Project (HGP), HUPO has recently officially launched a global Human Proteome Project (HPP) which is designed to map the entire human protein set. Given the presence of about 30% undisclosed proteins out of 20,300 protein gene products, a systematic global effort is necessary to achieve this goal with respect to protein abundance, distribution, subcellular localization, interaction with other biomolecules, and functions at specific time points. As a general experimental strategy, HPP groups employ the three working pillars for HPP: mass spectrometry, antibody capture, and bioinformatics tools and knowledge base. The HPP participants will take advantage of the output and cross-analyses from the ongoing HUPO initiatives and a chromosome-based protein mapping strategy, termed C-HPP with many national teams currently engaged. In addition, numerous biologically-driven projects will be stimulated and facilitated by the HPP. Timely planning with proper governance of HPP will deliver a protein parts list, reagents and tools for protein studies and analyses, and a stronger basis for personalized medicine. HUPO urges each national research funding agency and the scientific community at large to identify their preferred pathways to participate in aspects of this highly promising project in a HPP consortium of funders and investigators.

    View details for DOI 10.1074/mcp.O111.009993

    View details for PubMedID 21531903

  • Diverse protein kinase interactions identified by protein microarrays reveal novel connections between cellular processes GENES & DEVELOPMENT Fasolo, J., Sboner, A., Sun, M. G., Yu, H., Chen, R., Sharon, D., Kim, P. M., Gerstein, M., Snyder, M. 2011; 25 (7): 767-778

    Abstract

    Protein kinases are key regulators of cellular processes. In spite of considerable effort, a full understanding of the pathways they participate in remains elusive. We globally investigated the proteins that interact with the majority of yeast protein kinases using protein microarrays. Eighty-five kinases were purified and used to probe yeast proteome microarrays. One-thousand-twenty-three interactions were identified, and the vast majority were novel. Coimmunoprecipitation experiments indicate that many of these interactions occurred in vivo. Many novel links of kinases to previously distinct cellular pathways were discovered. For example, the well-studied Kss1 filamentous pathway was found to bind components of diverse cellular pathways, such as those of the stress response pathway and the Ccr4-Not transcriptional/translational regulatory complex; genetic tests revealed that these different components operate in the filamentation pathway in vivo. Overall, our results indicate that kinases operate in a highly interconnected network that coordinates many activities of the proteome. Our results further demonstrate that protein microarrays uncover a diverse set of interactions not observed previously.

    View details for DOI 10.1101/gad.1998811

    View details for Web of Science ID 000289062700010

    View details for PubMedID 21460040

    View details for PubMedCentralID PMC3070938

  • A User's Guide to the Encyclopedia of DNA Elements (ENCODE) PLOS BIOLOGY Myers, R. M., Stamatoyannopoulos, J., Snyder, M., Dunham, I., Hardison, R. C., Bernstein, B. E., Gingeras, T. R., Kent, W. J., Birney, E., Wold, B., Crawford, G. E., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Mikkelsen, T. S., Kheradpour, P., Zhang, X., Wang, L., Issner, R., Coyne, M. J., Durham, T., Ku, M., Thanh Truong, T., Ward, L. D., Altshuler, R. C., Lin, M. F., Kellis, M., Gingeras, T. R., Davis, C. A., Kapranov, P., Dobin, A., Zaleski, C., Schlesinger, F., Batut, P., Chakrabortty, S., Jha, S., Lin, W., Drenkow, J., Wang, H., Bell, K., Gao, H., Bell, I., Dumais, E., Dumais, J., Antonarakis, S. E., Ucla, C., Borel, C., Guigo, R., Djebali, S., Lagarde, J., Kingswood, C., Ribeca, P., Sammeth, M., Alioto, T., Merkel, A., Tilgner, H., Carninci, P., Hayashizaki, Y., Lassmann, T., Takahashi, H., Abdelhamid, R. F., Hannon, G., Fejes-Toth, K., Preall, J., Gordon, A., Sotirova, V., Reymond, A., Howald, C., Graison, E. A., Chrast, J., Ruan, Y., Ruan, X., Shahab, A., Poh, W. T., Wei, C., Crawford, G. E., Furey, T. S., Boyle, A. P., Sheffield, N. C., Song, L., Shibata, Y., Vales, T., Winter, D., Zhang, Z., London, D., Wang, T., Birney, E., Keefe, D., Iyer, V. R., Lee, B., McDaniell, R. M., Liu, Z., Battenhouse, A., Bhinge, A. A., Lieb, J. D., Grasfeder, L. L., Showers, K. A., Giresi, P. G., Kim, S. K., Shestak, C., Myers, R. M., Pauli, F., Reddy, T. E., Gertz, J., Partridge, E. C., Jain, P., Sprouse, R. O., Bansal, A., Pusey, B., Muratet, M. A., Varley, K. E., Bowling, K. M., Newberry, K. M., Nesmith, A. S., Dilocker, J. A., Parker, S. L., Waite, L. L., Thibeault, K., Roberts, K., Absher, D. M., Wold, B., Mortazavi, A., Williams, B., Marinov, G., Trout, D., Pepke, S., King, B., McCue, K., Kirilusha, A., DeSalvo, G., Fisher-Aylor, K., Amrhein, H., Vielmetter, J., Sherlock, G., Sidow, A., Batzoglou, S., Rauch, R., Kundaje, A., Libbrecht, M., Margulies, E. H., Parker, S. C., Elnitski, L., Green, E. D., Hubbard, T., Harrow, J., Searle, S., Kokocinski, F., Aken, B., Frankish, A., Hunt, T., Despacio-Reyes, G., Kay, M., Mukherjee, G., Bignell, A., Saunders, G., Boychenko, V., Brent, M., van Baren, M. J., Brown, R. H., Gerstein, M., Khurana, E., Balasubramanian, S., Zhang, Z., Lam, H., Cayting, P., Robilotto, R., Lu, Z., Guigo, R., Derrien, T., Tanzer, A., Knowles, D. G., Mariotti, M., Kent, W. J., Haussler, D., Harte, R., Diekhans, M., Kellis, M., Lin, M., Kheradpour, P., Ernst, J., Reymond, A., Howald, C., Graison, E. A., Chrast, J., Valencia, A., Tress, M., Manuel Rodriguez, J., Snyder, M., Landt, S. G., Raha, D., Shi, M., Euskirchen, G., Grubert, F., Kasowski, M., Lian, J., Cayting, P., Lacroute, P., Xu, Y., Monahan, H., Patacsil, D., Slifer, T., Yang, X., Charos, A., Reed, B., Wu, L., Auerbach, R. K., Habegger, L., Hariharan, M., Rozowsky, J., Abyzov, A., Weissman, S. M., Gerstein, M., Struhl, K., Lamarre-Vincent, N., Lindahl-Allen, M., Miotto, B., Moqtaderi, Z., Fleming, J. D., Newburger, P., Farnham, P. J., Frietze, S., O'Geen, H., Xu, X., Blahnik, K. R., Cao, A. R., Iyengar, S., Stamatoyannopoulos, J. A., Kaul, R., Thurman, R. E., Wang, H., Navas, P. A., Sandstrom, R., Sabo, P. J., Weaver, M., Canfield, T., Lee, K., Neph, S., Roach, V., Reynolds, A., Johnson, A., Rynes, E., Giste, E., Vong, S., Neri, J., Frum, T., Johnson, E. M., Nguyen, E. D., Ebersol, A. K., Sanchez, M. E., Sheffer, H. H., Lotakis, D., Haugen, E., Humbert, R., Kutyavin, T., Shafer, T., Dekker, J., Lajoie, B. R., Sanyal, A., Kent, W. J., Rosenbloom, K. R., Dreszer, T. R., Raney, B. J., Barber, G. P., Meyer, L. R., Sloan, C. A., Malladi, V. S., Cline, M. S., Learned, K., Swing, V. K., Zweig, A. S., Rhead, B., Fujita, P. A., Roskin, K., Karolchik, D., Kuhn, R. M., Haussler, D., Birney, E., Dunham, I., Wilder, S. P., Keefe, D., Sobral, D., Herrero, J., Beal, K., Lukk, M., Brazma, A., Vaquerizas, J. M., Luscombe, N. M., Bickel, P. J., Boley, N., Brown, J. B., Li, Q., Huang, H., Gerstein, M., Habegger, L., Sboner, A., Rozowsky, J., Auerbach, R. K., Yip, K. Y., Cheng, C., Yan, K., Bhardwaj, N., Wang, J., Lochovsky, L., Jee, J., Gibson, T., Leng, J., Du, J., Hardison, R. C., Harris, R. S., Song, G., Miller, W., Haussler, D., Roskin, K., Suh, B., Wang, T., Paten, B., Noble, W. S., Hoffman, M. M., Buske, O. J., Weng, Z., Dong, X., Wang, J., Xi, H., Tenenbaum, S. A., Doyle, F., Penalva, L. O., Chittur, S., Tullius, T. D., Parker, S. C., White, K. P., Karmakar, S., Victorsen, A., Jameel, N., Bild, N., Grossman, R. L., Snyder, M., Landt, S. G., Yang, X., Patacsil, D., Slifer, T., Dekker, J., Lajoie, B. R., Sanyal, A., Weng, Z., Whitfield, T. W., Wang, J., Collins, P. J., Trinklein, N. D., Partridge, E. C., Myers, R. M., Giddings, M. C., Chen, X., Khatun, J., Maier, C., Yu, Y., Gunawardena, H., Risk, B., Feingold, E. A., Lowdon, R. F., Dillon, L. A., Good, P. J. 2011; 9 (4)

    Abstract

    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

    View details for DOI 10.1371/journal.pbio.1001046

    View details for Web of Science ID 000289938900014

  • Diverse Roles and Interactions of the SWI/SNF Chromatin Remodeling Complex Revealed Using Global Approaches PLOS GENETICS Euskirchen, G. M., Auerbach, R. K., Davidov, E., Gianoulis, T. A., Zhong, G., Rozowsky, J., Bhardwaj, N., Gerstein, M. B., Snyder, M. 2011; 7 (3)

    Abstract

    A systems understanding of nuclear organization and events is critical for determining how cells divide, differentiate, and respond to stimuli and for identifying the causes of diseases. Chromatin remodeling complexes such as SWI/SNF have been implicated in a wide variety of cellular processes including gene expression, nuclear organization, centromere function, and chromosomal stability, and mutations in SWI/SNF components have been linked to several types of cancer. To better understand the biological processes in which chromatin remodeling proteins participate, we globally mapped binding regions for several components of the SWI/SNF complex throughout the human genome using ChIP-Seq. SWI/SNF components were found to lie near regulatory elements integral to transcription (e.g. 5' ends, RNA Polymerases II and III, and enhancers) as well as regions critical for chromosome organization (e.g. CTCF, lamins, and DNA replication origins). Interestingly we also find that certain configurations of SWI/SNF subunits are associated with transcripts that have higher levels of expression, whereas other configurations of SWI/SNF factors are associated with transcripts that have lower levels of expression. To further elucidate the association of SWI/SNF subunits with each other as well as with other nuclear proteins, we also analyzed SWI/SNF immunoprecipitated complexes by mass spectrometry. Individual SWI/SNF factors are associated with their own family members, as well as with cellular constituents such as nuclear matrix proteins, key transcription factors, and centromere components, implying a ubiquitous role in gene regulation and nuclear function. We find an overrepresentation of both SWI/SNF-associated regions and proteins in cell cycle and chromosome organization. Taken together the results from our ChIP and immunoprecipitation experiments suggest that SWI/SNF facilitates gene regulation and genome function more broadly and through a greater diversity of interactions than previously appreciated.

    View details for DOI 10.1371/journal.pgen.1002008

    View details for Web of Science ID 000288996600042

    View details for PubMedID 21408204

    View details for PubMedCentralID PMC3048368

  • Mapping copy number variation by population-scale genome sequencing NATURE Mills, R. E., Walter, K., Stewart, C., Handsaker, R. E., Chen, K., Alkan, C., Abyzov, A., Yoon, S. C., Ye, K., Cheetham, R. K., Chinwalla, A., Conrad, D. F., Fu, Y., Grubert, F., Hajirasouliha, I., Hormozdiari, F., Iakoucheva, L. M., Iqbal, Z., Kang, S., Kidd, J. M., Konkel, M. K., Korn, J., Khurana, E., Kural, D., Lam, H. Y., Leng, J., Li, R., Li, Y., Lin, C., Luo, R., Mu, X. J., Nemesh, J., Peckham, H. E., Rausch, T., Scally, A., Shi, X., Stromberg, M. P., Stuetz, A. M., Urban, A. E., Walker, J. A., Wu, J., Zhang, Y., Zhang, Z. D., Batzer, M. A., Ding, L., Marth, G. T., McVean, G., Sebat, J., Snyder, M., Wang, J., Ye, K., Eichler, E. E., Gerstein, M. B., Hurles, M. E., Lee, C., McCarroll, S. A., Korbel, J. O. 2011; 470 (7332): 59-65

    Abstract

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

    View details for DOI 10.1038/nature09708

    View details for Web of Science ID 000286886400033

    View details for PubMedID 21293372

    View details for PubMedCentralID PMC3077050

  • Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data GENOME RESEARCH Lu, Z. J., Yip, K. Y., Wang, G., Shou, C., Hillier, L. W., Khurana, E., Agarwal, A., Auerbach, R., Rozowsky, J., Cheng, C., Kato, M., Miller, D. M., Slack, F., Snyder, M., Waterston, R. H., Reinke, V., Gerstein, M. B. 2011; 21 (2): 276-285

    Abstract

    We present an integrative machine learning method, incRNA, for whole-genome identification of noncoding RNAs (ncRNAs). It combines a large amount of expression data, RNA secondary-structure stability, and evolutionary conservation at the protein and nucleic-acid level. Using the incRNA model and data from the modENCODE consortium, we are able to separate known C. elegans ncRNAs from coding sequences and other genomic elements with a high level of accuracy (97% AUC on an independent validation set), and find more than 7000 novel ncRNA candidates, among which more than 1000 are located in the intergenic regions of C. elegans genome. Based on the validation set, we estimate that 91% of the approximately 7000 novel ncRNA candidates are true positives. We then analyze 15 novel ncRNA candidates by RT-PCR, detecting the expression for 14. In addition, we characterize the properties of all the novel ncRNA candidates and find that they have distinct expression patterns across developmental stages and tend to use novel RNA structural families. We also find that they are often targeted by specific transcription factors (∼59% of intergenic novel ncRNA candidates). Overall, our study identifies many new potential ncRNAs in C. elegans and provides a method that can be adapted to other organisms.

    View details for DOI 10.1101/gr.110189.110

    View details for Web of Science ID 000286804100013

    View details for PubMedID 21177971

    View details for PubMedCentralID PMC3032931

  • Diverse transcription factor binding features revealed by genome-wide ChIP-seq in C. elegans GENOME RESEARCH Niu, W., Lu, Z. J., Zhong, M., Sarov, M., Murray, J. I., Brdlik, C. M., Janette, J., Chen, C., Alves, P., Preston, E., Slightham, C., Jiang, L., Hyman, A. A., Kim, S. K., Waterston, R. H., Gerstein, M., Snyder, M., Reinke, V. 2011; 21 (2): 245-254

    Abstract

    Regulation of gene expression by sequence-specific transcription factors is central to developmental programs and depends on the binding of transcription factors with target sites in the genome. To date, most such analyses in Caenorhabditis elegans have focused on the interactions between a single transcription factor with one or a few select target genes. As part of the modENCODE Consortium, we have used chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) to determine the genome-wide binding sites of 22 transcription factors (ALR-1, BLMP-1, CEH-14, CEH-30, EGL-27, EGL-5, ELT-3, EOR-1, GEI-11, HLH-1, LIN-11, LIN-13, LIN-15B, LIN-39, MAB-5, MDL-1, MEP-1, PES-1, PHA-4, PQM-1, SKN-1, and UNC-130) at diverse developmental stages. For each factor we determined candidate gene targets, both coding and non-coding. The typical binding sites of almost all factors are within a few hundred nucleotides of the transcript start site. Most factors target a mixture of coding and non-coding target genes, although one factor preferentially binds to non-coding RNA genes. We built a regulatory network among the 22 factors to determine their functional relationships to each other and found that some factors appear to act preferentially as regulators and others as target genes. Examination of the binding targets of three related HOX factors--LIN-39, MAB-5, and EGL-5--indicates that these factors regulate genes involved in cellular migration, neuronal function, and vulval differentiation, consistent with their known roles in these developmental processes. Ultimately, the comprehensive mapping of transcription factor binding sites will identify features of transcriptional networks that regulate C. elegans developmental processes.

    View details for DOI 10.1101/gr.114587.110

    View details for Web of Science ID 000286804100010

    View details for PubMedID 21177963

    View details for PubMedCentralID PMC3032928

  • RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries BIOINFORMATICS Habegger, L., Sboner, A., Gianoulis, T. A., Rozowsky, J., Agarwal, A., Snyder, M., Gerstein, M. 2011; 27 (2): 281-283

    Abstract

    The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. Availability and implementation: RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.

    View details for DOI 10.1093/bioinformatics/btq643

    View details for Web of Science ID 000286215200025

    View details for PubMedID 21134889

    View details for PubMedCentralID PMC3018817

  • Stat3 is essential for neuronal differentiation through direct transcriptional regulation of the Sox6 gene FEBS LETTERS Snyder, M., Huang, X., Zhang, J. J. 2011; 585 (1): 148-152

    Abstract

    The transcription factor Signal Transducer and Activator of Transcription 3 (Stat3) functions in various cellular processes including neuronal differentiation. We show that the SRY-box containing gene 6 (Sox6) gene, important for neuronal differentiation, is a direct target gene of Stat3. We demonstrate that in response to ligand stimulation, Stat3 binds to the Sox6 promoter and induces its expression. Furthermore, Stat3 is activated and Sox6 is induced during neuronal differentiation of P19 cells in the absence of exogenous ligand treatment. Moreover, using an RNA interference approach, we show that Stat3 is required for Sox6 expression during neuronal differentiation.

    View details for DOI 10.1016/j.febslet.2010.11.030

    View details for Web of Science ID 000285921500025

    View details for PubMedID 21094641

  • RNA sequencing. Methods in molecular biology (Clifton, N.J.) Waern, K., Nagalakshmi, U., Snyder, M. 2011; 759: 125-132

    Abstract

    This chapter describes the RNA sequencing (RNA-Seq) protocol, whereby RNA from yeast cells is prepared for sequencing on an Illumina Genome Analyzer. The protocol can easily be altered to use RNA from a different organism. This chapter covers RNA extraction, cDNA synthesis, cDNA fragmentation, and Illumina cDNA library generation and contains some brief remarks on bioinformatic analysis.

    View details for DOI 10.1007/978-1-61779-173-4_8

    View details for PubMedID 21863485

  • Embryonic Stem Cells: Discovery, Development, and Current Trends STEM CELLS AND REGENERATIVE MEDICINE: FROM MOLECULAR EMBRYOLOGY TO TISSUE ENGINEERING Theodorou, E., Snyder, M., Appasani, K., Appasani, R. K. 2011: 19–43
  • Analyzing In Vivo Metabolite-Protein Interactions By Large-Scale Systematic Analyses. Current protocols in chemical biology Li, X., Snyder, M. 2011; 3 (4): 181-196

    Abstract

    Metabolites interact with proteins in vivo in various ways other than enzymatic reactions. Profiling of such interactions may help disclose unknown molecular mechanisms that regulate protein functions, and provide potential targets for disease treatment. Here we describe a procedure for systematic analyses of metabolite-protein interactions in vivo. This procedure couples protein affinity purification and mass spectrometry to identify metabolite-protein interactions. The primary effort can be completed within one day and scaled to process hundreds of samples in a batch. Originally developed in yeast, the same principle and protocol can be adapted to other organisms.

    View details for PubMedID 22846927

  • The CRIT framework for identifying cross patterns in systems biology and application to chemogenomics GENOME BIOLOGY Gianoulis, T. A., Agarwal, A., Snyder, M., Gerstein, M. B. 2011; 12 (3)

    Abstract

    Biological data is often tabular but finding statistically valid connections between entities in a sequence of tables can be problematic--for example, connecting particular entities in a drug property table to gene properties in a second table, using a third table associating genes with drugs. Here we present an approach (CRIT) to find connections such as these and show how it can be applied in a variety of genomic contexts including chemogenomics data.

    View details for DOI 10.1186/gb-2011-12-3-r32

    View details for Web of Science ID 000291309200012

    View details for PubMedID 21453526

    View details for PubMedCentralID PMC3129682

  • Regulatory Variation Within Between Species ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 12 Zheng, W., Gianoulis, T. A., Karczewski, K. J., Zhao, H., Snyder, M. 2011; 12: 327-346

    Abstract

    Understanding how individuals differ from one another and from closely related species is a fundamental problem in biology. Recent evidence suggests that much of the variation both within and between species is due to differential gene regulation. Here we review differential gene regulation focusing on evolutionary-developmental (evo-devo) biology, global comparison of genomic sequences, whole-genome gene expression, and transcription factor (TF) binding profiles. We also explore the relationship between divergence rate of regulatory sequences, coding sequences, and TF binding events using several different measures and discuss their implications in the context of evolution of regulatory networks. Finally, we discuss the current status and future challenges in relating regulatory variation to the divergence across and within species.

    View details for DOI 10.1146/annurev-genom-082908-150139

    View details for Web of Science ID 000295819900014

    View details for PubMedID 21721942

  • Kinase substrate interactions. Methods in molecular biology (Clifton, N.J.) Smith, M. G., Ptacek, J., Snyder, M. 2011; 723: 201-212

    Abstract

    Kinases have become popular therapeutic targets primarily due to their integral role in cell cycle and tumor progression. The efficacy of high-throughput screening efforts is dependent on the development of high quality multiplex tools capable of replacing lower-throughput technologies such as mass spectroscopy or solution-based assays for the study of kinase-substrate interactions. Functional protein microarrays are comprised of thousands of immobilized proteins on glass slides that have been used successfully to identify protein-protein interactions. Here, we describe the application of functional protein microarrays for the identification of the phosphorylation targets of individual protein kinases using highly sensitive radioactive detection and robust informatics algorithms.

    View details for DOI 10.1007/978-1-61779-043-0_13

    View details for PubMedID 21370067

  • Measuring the Evolutionary Rewiring of Biological Networks PLOS COMPUTATIONAL BIOLOGY Shou, C., Bhardwaj, N., Lam, H. Y., Yan, K., Kim, P. M., Snyder, M., Gerstein, M. B. 2011; 7 (1)

    Abstract

    We have accumulated a large amount of biological network data and expect even more to come. Soon, we anticipate being able to compare many different biological networks as we commonly do for molecular sequences. It has long been believed that many of these networks change, or "rewire", at different rates. It is therefore important to develop a framework to quantify the differences between networks in a unified fashion. We developed such a formalism based on analogy to simple models of sequence evolution, and used it to conduct a systematic study of network rewiring on all the currently available biological networks. We found that, similar to sequences, biological networks show a decreased rate of change at large time divergences, because of saturation in potential substitutions. However, different types of biological networks consistently rewire at different rates. Using comparative genomics and proteomics data, we found a consistent ordering of the rewiring rates: transcription regulatory, phosphorylation regulatory, genetic interaction, miRNA regulatory, protein interaction, and metabolic pathway network, from fast to slow. This ordering was found in all comparisons we did of matched networks between organisms. To gain further intuition on network rewiring, we compared our observed rewirings with those obtained from simulation. We also investigated how readily our formalism could be mapped to other network contexts; in particular, we showed how it could be applied to analyze changes in a range of "commonplace" networks such as family trees, co-authorships and linux-kernel function dependencies.

    View details for DOI 10.1371/journal.pcbi.1001050

    View details for Web of Science ID 000286652100009

    View details for PubMedID 21253555

    View details for PubMedCentralID PMC3017101

  • Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project SCIENCE Gerstein, M. B., Lu, Z. J., Van Nostrand, E. L., Cheng, C., Arshinoff, B. I., Liu, T., Yip, K. Y., Robilotto, R., Rechtsteiner, A., Ikegami, K., Alves, P., Chateigner, A., Perry, M., Morris, M., Auerbach, R. K., Feng, X., Leng, J., Vielle, A., Niu, W., Rhrissorrakrai, K., Agarwal, A., Alexander, R. P., Barber, G., Brdlik, C. M., Brennan, J., Brouillet, J. J., Carr, A., Cheung, M., Clawson, H., Contrino, S., Dannenberg, L. O., Dernburg, A. F., Desai, A., Dick, L., Dose, A. C., Du, J., Egelhofer, T., Ercan, S., Euskirchen, G., Ewing, B., Feingold, E. A., Gassmann, R., Good, P. J., Green, P., Gullier, F., Gutwein, M., Guyer, M. S., Habegger, L., Han, T., Henikoff, J. G., Henz, S. R., Hinrichs, A., Holster, H., Hyman, T., Iniguez, A. L., Janette, J., Jensen, M., Kato, M., Kent, W. J., Kephart, E., Khivansara, V., Khurana, E., Kim, J. K., Kolasinska-Zwierz, P., Lai, E. C., Latorre, I., Leahey, A., Lewis, S., Lloyd, P., Lochovsky, L., Lowdon, R. F., Lubling, Y., Lyne, R., MacCoss, M., Mackowiak, S. D., Mangone, M., McKay, S., Mecenas, D., Merrihew, G., Miller, D. M., Muroyama, A., Murray, J. I., Ooi, S., Pham, H., Phippen, T., Preston, E. A., Rajewsky, N., Raetsch, G., Rosenbaum, H., Rozowsky, J., Rutherford, K., Ruzanov, P., Sarov, M., Sasidharan, R., Sboner, A., Scheid, P., Segal, E., Shin, H., Shou, C., Slack, F. J., Slightam, C., Smith, R., Spencer, W. C., Stinson, E. O., Taing, S., Takasaki, T., Vafeados, D., Voronina, K., Wang, G., Washington, N. L., Whittle, C. M., Wu, B., Yan, K., Zeller, G., Zha, Z., Zhong, M., Zhou, X., Ahringer, J., Strome, S., Gunsalus, K. C., Micklem, G., Liu, X. S., Reinke, V., Kim, S. K., Hillier, L. W., Henikoff, S., Piano, F., Snyder, M., Stein, L., Lieb, J. D., Waterston, R. H. 2010; 330 (6012): 1775-1787

    Abstract

    We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.

    View details for DOI 10.1126/science.1196914

    View details for Web of Science ID 000285603700031

    View details for PubMedID 21177976

    View details for PubMedCentralID PMC3142569

  • Statistical Issues in Mapping QTLs for RNA-seq Data 19th Annual Meeting of the International-Genetic-Epidemiology-Society Zheng, W., Raha, D., Snyder, M., Zhao, H. WILEY-BLACKWELL. 2010: 942–42
  • Exploring successful community pharmacist-physician collaborative working relationships using mixed methods RESEARCH IN SOCIAL & ADMINISTRATIVE PHARMACY Snyder, M. E., Zillich, A. J., Primack, B. A., Rice, K. R., McGivney, M. A., Pringle, J. L., Smith, R. B. 2010; 6 (4): 307-323

    Abstract

    Collaborative working relationships (CWRs) between community pharmacists and physicians may foster the provision of medication therapy management services, disease state management, and other patient care activities; however, pharmacists have expressed difficulty in developing such relationships. Additional work is needed to understand the specific pharmacist-physician exchanges that effectively contribute to the development of CWR. Data from successful pairs of community pharmacists and physicians may provide further insights into these exchange variables and expand research on models of professional collaboration.To describe the professional exchanges that occurred between community pharmacists and physicians engaged in successful CWRs, using a published conceptual model and tool for quantifying the extent of collaboration.A national pool of experts in community pharmacy practice identified community pharmacists engaged in CWRs with physicians. Five pairs of community pharmacists and physician colleagues participated in individual semistructured interviews, and 4 of these pairs completed the Pharmacist-Physician Collaborative Index (PPCI). Main outcome measures include quantitative (ie, scores on the PPCI) and qualitative information about professional exchanges within 3 domains found previously to influence relationship development: relationship initiation, trustworthiness, and role specification.On the PPCI, participants scored similarly on trustworthiness; however, physicians scored higher on relationship initiation and role specification. The qualitative interviews revealed that when initiating relationships, it was important for many pharmacists to establish open communication through face-to-face visits with physicians. Furthermore, physicians were able to recognize in these pharmacists a commitment for improved patient care. Trustworthiness was established by pharmacists making consistent contributions to care that improved patient outcomes over time. Open discussions regarding professional roles and an acknowledgment of professional norms (ie, physicians as decision makers) were essential.The findings support and extend the literature on pharmacist-physician CWRs by examining the exchange domains of relationship initiation, trustworthiness, and role specification qualitatively and quantitatively among pairs of practitioners. Relationships appeared to develop in a manner consistent with a published model for CWRs, including the pharmacist as relationship initiator, the importance of communication during early stages of the relationship, and an emphasis on high-quality pharmacist contributions.

    View details for DOI 10.1016/j.sapharm.2009.11.008

    View details for Web of Science ID 000285168400005

    View details for PubMedID 21111388

  • Transformation of Candida albicans with a synthetic hygromycin B resistance gene YEAST Basso, L. R., Bartiss, A., Mao, Y., Gast, C. E., Coelho, P. S., Snyder, M., Wong, B. 2010; 27 (12): 1039-1048

    Abstract

    Synthetic genes that confer resistance to the antibiotic nourseothricin in the pathogenic fungus Candida albicans are available, but genes conferring resistance to other antibiotics are not. We found that multiple C. albicans strains were inhibited by hygromycin B, so we designed a 1026 bp gene (CaHygB) that encodes Escherichia coli hygromycin B phosphotransferase with C. albicans codons. CaHygB conferred hygromycin B resistance in C. albicans transformed with ars2-containing plasmids or single-copy integrating vectors. Since CaHygB did not confer nourseothricin resistance and since the nourseothricin resistance marker SAT-1 did not confer hygromycin B resistance, we reasoned that these two markers could be used for homologous gene disruptions in wild-type C. albicans. We used PCR to fuse CaHygB or SAT-1 to approximately 1 kb of 5' and 3' noncoding DNA from C. albicans ARG4, HIS1 and LEU2, and introduced the resulting amplicons into six wild-type C. albicans strains. Homologous targeting frequencies were approximately 50-70%, and disruption of ARG4, HIS1 and LEU2 alleles was verified by the respective transformants' inabilities to grow without arginine, histidine and leucine. CaHygB should be a useful tool for genetic manipulation of different C. albicans strains, including clinical isolates.

    View details for DOI 10.1002/yea.1813

    View details for PubMedID 20737428

  • Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads BMC GENOMICS Martin, J., Bruno, V. M., Fang, Z., Meng, X., Blow, M., Zhang, T., Sherlock, G., Snyder, M., Wang, Z. 2010; 11

    Abstract

    Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied.Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95%) and reconstruct full-length genes for the majority of the existing gene models (54.3%). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics.These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.

    View details for DOI 10.1186/1471-2164-11-663

    View details for Web of Science ID 000285303000001

    View details for PubMedID 21106091

    View details for PubMedCentralID PMC3152782

  • Extensive In Vivo Metabolite-Protein Interactions Revealed by Large-Scale Systematic Analyses CELL Li, X., Gianoulis, T. A., Yip, K. Y., Gerstein, M., Snyder, M. 2010; 143 (4): 639-650

    Abstract

    Natural small compounds comprise most cellular molecules and bind proteins as substrates, products, cofactors, and ligands. However, a large-scale investigation of in vivo protein-small metabolite interactions has not been performed. We developed a mass spectrometry assay for the large-scale identification of in vivo protein-hydrophobic small metabolite interactions in yeast and analyzed compounds that bind ergosterol biosynthetic proteins and protein kinases. Many of these proteins bind small metabolites; a few interactions were previously known, but the vast majority are new. Importantly, many key regulatory proteins such as protein kinases bind metabolites. Ergosterol was found to bind many proteins and may function as a general regulator. It is required for the activity of Ypk1, a mammalian AKT/SGK kinase homolog. Our study defines potential key regulatory steps in lipid biosynthetic pathways and suggests that small metabolites may play a more general role as regulators of protein activity and function than previously appreciated.

    View details for DOI 10.1016/j.cell.2010.09.048

    View details for Web of Science ID 000284149100020

    View details for PubMedID 21035178

    View details for PubMedCentralID PMC3005334

  • A map of human genome variation from population-scale sequencing NATURE Altshuler, D., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., Collins, F. S., De La Vega, F. M., Donnelly, P., Egholm, M., Flicek, P., Gabriel, S. B., Gibbs, R. A., Knoppers, B. M., Lander, E. S., Lehrach, H., Mardis, E. R., McVean, G. A., Nickerson, D., Peltonen, L., Schafer, A. J., Sherry, S. T., Wang, J., Wilson, R. K., Gibbs, R. A., Deiros, D., Metzker, M., Muzny, D., Reid, J., Wheeler, D., Wang, J., Li, J., Jian, M., Li, G., Li, R., Liang, H., Tian, G., Wang, B., Wang, J., Wang, W., Yang, H., Zhang, X., Zheng, H., Lander, E. S., Altshuler, D. L., Ambrogio, L., Bloom, T., Cibulskis, K., Fennell, T. J., Gabriel, S. B., Jaffe, D. B., Shefler, E., Sougnez, C. L., Bentley, D. R., Gormley, N., Humphray, S., Kingsbury, Z., Koko-Gonzales, P., Stone, J., McKernan, K. J., Costa, G. L., Ichikawa, J. K., Lee, C. C., Sudbrak, R., Lehrach, H., Borodina, T. A., Dahl, A., Davydov, A. N., Marquardt, P., Mertes, F., Nietfeld, W., Rosenstiel, P., Schreiber, S., Soldatov, A. V., Timmermann, B., Tolzmann, M., Egholm, M., Affourtit, J., Ashworth, D., Attiya, S., Bachorski, M., Buglione, E., Burke, A., Caprio, A., Celone, C., Clark, S., Conners, D., Desany, B., Gu, L., Guccione, L., Kao, K., Kebbel, A., Knowlton, J., Labrecque, M., McDade, L., Mealmaker, C., Minderman, M., Nawrocki, A., Niazi, F., Pareja, K., Ramenani, R., Riches, D., Song, W., Turcotte, C., Wang, S., Mardis, E. R., Dooling, D., Fulton, L., Fulton, R., Weinstock, G., Durbin, R. M., Burton, J., Carter, D. M., Churcher, C., Coffey, A., Cox, A., Palotie, A., Quail, M., Skelly, T., Stalker, J., Swerdlow, H. P., Turner, D., De Witte, A., Giles, S., Gibbs, R. A., Wheeler, D., Bainbridge, M., Challis, D., Sabo, A., Yu, F., Yu, J., Wang, J., Fang, X., Guo, X., Li, R., Li, Y., Luo, R., Tai, S., Wu, H., Zheng, H., Zheng, X., Zhou, Y., Yang, H., Marth, G. T., Garrison, E. P., Huang, W., Indap, A., Kural, D., Lee, W., Leong, W. F., Huang, W., Indap, A., Kural, D., Lee, W., Leong, W. F., Quinlan, A. R., Stewart, C., Stromberg, M. P., Ward, A. N., Wu, J., Lee, C., Mills, R. E., Shi, X., Daly, M. J., DePristo, M. A., Altshuler, D. L., Ball, A. D., Banks, E., Bloom, T., Browning, B. L., Cibulskis, K., Fennell, T. J., Garimella, K. V., Grossman, S. R., Handsaker, R. E., Hanna, M., Hartl, C., Jaffe, D. B., Kernytsky, A. M., Korn, J. M., Li, H., Maguire, J. R., McCarroll, S. A., McKenna, A., Nemesh, J. C., Philippakis, A. A., Poplin, R. E., Price, A., Rivas, M. A., Sabeti, P. C., Schaffner, S. F., Shefler, E., Shlyakhter, I. A., Cooper, D. N., Ball, E. V., Mort, M., Phillips, A. D., Stenson, P. D., Sebat, J., Makarov, V., Ye, K., Yoon, S. C., Bustamante, C. D., Clark, A. G., Boyko, A., Degenhardt, J., Gravel, S., Gutenkunst, R. N., Kaganovich, M., Keinan, A., Lacroute, P., Ma, X., Reynolds, A., Clarke, L., Flicek, P., Cunningham, F., Herrero, J., Keenen, S., Kulesha, E., Leinonen, R., McLaren, W., Radhakrishnan, R., Smith, R. E., Zalunin, V., Zheng-Bradley, X., Korbel, J. O., Stuetz, A. M., Humphray, S., Bauer, M., Cheetham, R. K., Cox, T., Eberle, M., James, T., Kahn, S., Murray, L., Ye, K., De La Vega, F. M., Fu, Y., Hyland, F. C., Manning, J. M., McLaughlin, S. F., Peckham, H. E., Sakarya, O., Sun, Y. A., Tsung, E. F., Batzer, M. A., Konkel, M. K., Walker, J. A., Sudbrak, R., Albrecht, M. W., Amstislavskiy, V. S., Herwig, R., Parkhomchuk, D. V., Sherry, S. T., Agarwala, R., Khouri, H., Morgulis, A. O., Paschall, J. E., Phan, L. D., Rotmistrovsky, K. E., Sanders, R. D., Shumway, M. F., Xiao, C., McVean, G. A., Auton, A., Iqbal, Z., Lunter, G., Marchini, J. L., Moutsianas, L., Myers, S., Tumian, A., Desany, B., Knight, J., Winer, R., Craig, D. W., Beckstrom-Sternberg, S. M., Christoforides, A., Kurdoglu, A. A., Pearson, J., Sinari, S. A., Tembe, W. D., Haussler, D., Hinrichs, A. S., Katzman, S. J., Kern, A., Kuhn, R. M., Przeworski, M., Hernandez, R. D., Howie, B., Kelley, J. L., Melton, S. C., Abecasis, G. R., Li, Y., Anderson, P., Blackwell, T., Chen, W., Cookson, W. O., Ding, J., Kang, H. M., Lathrop, M., Liang, L., Moffatt, M. F., Scheet, P., Sidore, C., Snyder, M., Zhan, X., Zoellner, S., Awadalla, P., Casals, F., Idaghdour, Y., Keebler, J., Stone, E. A., Zilversmit, M., Jorde, L., Xing, J., Eichler, E. E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J. M., Sahinalp, S. C., Sudmant, P. H., Mardis, E. R., Chen, K., Chinwalla, A., Ding, L., Koboldt, D. C., McLellan, M. D., Dooling, D., Weinstock, G., Wallis, J. W., Wendl, M. C., Zhang, Q., Durbin, R. M., Albers, C. A., Ayub, Q., Balasubramaniam, S., Barrett, J. C., Carter, D. M., Chen, Y., Conrad, D. F., Danecek, P., Dermitzakis, E. T., Hu, M., Huang, N., Hurles, M. E., Jin, H., Jostins, L., Keane, T. M., Keane, T. M., Le, S. Q., Lindsay, S., Long, Q., MacArthur, D. G., Montgomery, S. B., Parts, L., Stalker, J., Tyler-Smith, C., Walter, K., Zhang, Y., Gerstein, M. B., Snyder, M., Abyzov, A., Abyzov, A., Balasubramanian, S., Bjornson, R., Du, J., Grubert, F., Habegger, L., Haraksingh, R., Jee, J., Khurana, E., Lam, H. Y., Leng, J., Mu, X. J., Urban, A. E., Zhang, Z., Li, Y., Luo, R., Marth, G. T., Garrison, E. P., Kural, D., Quinlan, A. R., Stewart, C., Stromberg, M. P., Ward, A. N., Wu, J., Lee, C., Mills, R. E., Shi, X., McCarroll, S. A., Banks, E., DePristo, M. A., Handsaker, R. E., Hartl, C., Korn, J. M., Li, H., Nemesh, J. C., Sebat, J., Makarov, V., Ye, K., Yoon, S. C., Degenhardt, J., Kaganovich, M., Clarke, L., Smith, R. E., Zheng-Bradley, X., Korbel, J. O., Humphray, S., Cheetham, R. K., Eberle, M., Kahn, S., Murray, L., Ye, K., De La Vega, F. M., Fu, Y., Peckham, H. E., Sun, Y. A., Batzer, M. A., Konkel, M. K., Xiao, C., Iqbal, Z., Desany, B., Blackwell, T., Snyder, M., Xing, J., Eichler, E. E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J. M., Chen, K., Chinwalla, A., Ding, L., McLellan, M. D., Wallis, J. W., Hurles, M. E., Conrad, D. F., Walter, K., Zhang, Y., Gerstein, M. B., Snyder, M., Abyzov, A., Du, J., Grubert, F., Haraksingh, R., Jee, J., Khurana, E., Lam, H. Y., Leng, J., Mu, X. J., Urban, A. E., Zhang, Z., Gibbs, R. A., Bainbridge, M., Challis, D., Coafra, C., Dinh, H., Kovar, C., Lee, S., Muzny, D., Nazareth, L., Reid, J., Sabo, A., Yu, F., Yu, J., Marth, G. T., Garrison, E. P., Indap, A., Leong, W. F., Quinlan, A. R., Stewart, C., Ward, A. N., Wu, J., Cibulskis, K., Fennell, T. J., Gabriel, S. B., Garimella, K. V., Hartl, C., Shefler, E., Sougnez, C. L., Wilkinson, J., Clark, A. G., Gravel, S., Grubert, F., Clarke, L., Flicek, P., Smith, R. E., Zheng-Bradley, X., Sherry, S. T., Khouri, H. M., Paschall, J. E., Shumway, M. F., Xiao, C., McVean, G. A., Katzman, S. J., Abecasis, G. R., Blackwell, T., Mardis, E. R., Dooling, D., Fulton, L., Fulton, R., Koboldt, D. C., Durbin, R. M., Balasubramaniam, S., Coffey, A., Keane, T. M., MacArthur, D. G., Palotie, A., Scott, C., Stalker, J., Tyler-Smith, C., Gerstein, M. B., Balasubramanian, S., Chakravarti, A., Knoppers, B. M., Peltonen, L., Abecasis, G. R., Bustamante, C. D., Gharani, N., Gibbs, R. A., Jorde, L., Kaye, J. S., Kent, A., Li, T., McGuire, A. L., McVean, G. A., Ossorio, P. N., Rotimi, C. N., Su, Y., Toji, L. H., Tyler-Smith, C., Brooks, L. D., Felsenfeld, A. L., McEwen, J. E., Abdallah, A., Juenger, C. R., Clemm, N. C., Collins, F. S., Duncanson, A., Green, E. D., Guyer, M. S., Peterson, J. L., Schafer, A. J., Abecasis, G. R., Altshuler, D. L., Auton, A., Brooks, L. D., Durbin, R. M., Gibbs, R. A., Hurles, M. E., McVean, G. A. 2010; 467 (7319): 1061-1073

    Abstract

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

    View details for DOI 10.1038/nature09534

    View details for Web of Science ID 000283548600039

    View details for PubMedCentralID PMC3042601

  • Yeast proteomics and protein microarrays JOURNAL OF PROTEOMICS Chen, R., Snyder, M. 2010; 73 (11): 2147-2157

    Abstract

    Our understanding of biological processes as well as human diseases has improved greatly thanks to studies on model organisms such as yeast. The power of scientific approaches with yeast lies in its relatively simple genome, its facile classical and molecular genetics, as well as the evolutionary conservation of many basic biological mechanisms. However, even in this simple model organism, systems biology studies, especially proteomic studies had been an intimidating task. During the past decade, powerful high-throughput technologies in proteomic research have been developed for yeast including protein microarray technology. The protein microarray technology allows the interrogation of protein-protein, protein-DNA, protein-small molecule interaction networks as well as post-translational modification networks in a large-scale, high-throughput manner. With this technology, many groundbreaking findings have been established in studies with the budding yeast Saccharomyces cerevisiae, most of which could have been unachievable with traditional approaches. Discovery of these networks has profound impact on explicating biological processes with a proteomic point of view, which may lead to a better understanding of normal biological phenomena as well as various human diseases.

    View details for DOI 10.1016/j.jprot.2010.08.003

    View details for Web of Science ID 000283903000008

    View details for PubMedID 20728591

    View details for PubMedCentralID PMC2949546

  • Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq GENOME RESEARCH Bruno, V. M., Wang, Z., Marjani, S. L., Euskirchen, G. M., Martin, J., Sherlock, G., Snyder, M. 2010; 20 (10): 1451-1458

    Abstract

    Candida albicans is the major invasive fungal pathogen of humans, causing diseases ranging from superficial mucosal infections to disseminated, systemic infections that are often lifethreatening. We have used massively parallel high-throughput sequencing of cDNA (RNA-seq) to generate a high-resolution map of the C. albicans transcriptome under several different environmental conditions. We have quantitatively determined all of the regions that are transcribed under these different conditions, and have identified 602 novel transcriptionally active regions (TARs) and numerous novel introns that are not represented in the current genome annotation. Interestingly, the expression of many of these TARs is regulated in a condition-specific manner. This comprehensive transcriptome analysis significantly enhances the current genome annotation of C. albicans, a necessary framework for a complete understanding of the molecular mechanisms of pathogenesis for this important eukaryotic pathogen.

    View details for DOI 10.1101/gr.109553.110

    View details for Web of Science ID 000282375000015

    View details for PubMedID 20810668

    View details for PubMedCentralID PMC2945194

  • Annotating non-coding regions of the genome NATURE REVIEWS GENETICS Alexander, R. P., Fang, G., Rozowsky, J., Snyder, M., Gerstein, M. B. 2010; 11 (8): 559-571

    Abstract

    Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.

    View details for DOI 10.1038/nrg2814

    View details for Web of Science ID 000279988800012

    View details for PubMedID 20628352

  • Minimum information about a protein affinity reagent (MIAPAR) NATURE BIOTECHNOLOGY Bourbeillon, J., Orchard, S., Benhar, I., Borrebaeck, C., de Daruvar, A., Duebel, S., Frank, R., Gibson, F., Gloriam, D., Haslam, N., Hiltker, T., Humphrey-Smith, I., Hust, M., Juncker, D., Koegl, M., Konthur, Z., Korn, B., Krobitsch, S., Muyldermans, S., Nygren, P., Palcy, S., Polic, B., Rodriguez, H., Sawyer, A., Schlapshy, M., Snyder, M., Stoevesandt, O., Taussig, M. J., Templin, M., Uhlen, M., van der Maarel, S., Wingren, C., Hermjakob, H., Sherman, D. 2010; 28 (7): 650–53

    View details for DOI 10.1038/nbt0710-650

    View details for Web of Science ID 000279723900012

    View details for PubMedID 20622827

  • Initiation of the TORC1-Regulated G(0) Program Requires Igo1/2, which License Specific mRNAs to Evade Degradation via the 5 '-3 ' mRNA Decay Pathway MOLECULAR CELL Talarek, N., Cameroni, E., Jaquenoud, M., Luo, X., Bontron, S., Lippman, S., Devgan, G., Snyder, M., Broach, J. R., De Virgilio, C. 2010; 38 (3): 345-355

    Abstract

    Eukaryotic cell proliferation is controlled by growth factors and essential nutrients, in the absence of which cells may enter into a quiescent (G(0)) state. In yeast, nitrogen and/or carbon limitation causes downregulation of the conserved TORC1 and PKA signaling pathways and, consequently, activation of the PAS kinase Rim15, which orchestrates G(0) program initiation and ensures proper life span by controlling distal readouts, including the expression of specific genes. Here, we report that Rim15 coordinates transcription with posttranscriptional mRNA protection by phosphorylating the paralogous Igo1 and Igo2 proteins. This event, which stimulates Igo proteins to associate with the mRNA decapping activator Dhh1, shelters newly expressed mRNAs from degradation via the 5'-3' mRNA decay pathway, thereby enabling their proper translation during initiation of the G(0) program. These results delineate a likely conserved mechanism by which nutrient limitation leads to stabilization of specific mRNAs that are critical for cell differentiation and life span.

    View details for DOI 10.1016/j.molcel.2010.02.039

    View details for Web of Science ID 000277818400006

    View details for PubMedID 20471941

  • MOTIPS: Automated Motif Analysis for Predicting Targets of Modular Protein Domains BMC BIOINFORMATICS Lam, H. Y., Kim, P. M., Mok, J., Tonikian, R., Sidhu, S. S., Turk, B. E., Snyder, M., Gerstein, M. B. 2010; 11

    Abstract

    Many protein interactions, especially those involved in signaling, involve short linear motifs consisting of 5-10 amino acid residues that interact with modular protein domains such as the SH3 binding domains and the kinase catalytic domains. One straightforward way of identifying these interactions is by scanning for matches to the motif against all the sequences in a target proteome. However, predicting domain targets by motif sequence alone without considering other genomic and structural information has been shown to be lacking in accuracy.We developed an efficient search algorithm to scan the target proteome for potential domain targets and to increase the accuracy of each hit by integrating a variety of pre-computed features, such as conservation, surface propensity, and disorder. The integration is performed using naïve Bayes and a training set of validated experiments.By integrating a variety of biologically relevant features to predict domain targets, we demonstrated a notably improved prediction of modular protein domain targets. Combined with emerging high-resolution data of domain specificities, we believe that our approach can assist in the reconstruction of many signaling pathways.

    View details for DOI 10.1186/1471-2105-11-243

    View details for Web of Science ID 000279728900007

    View details for PubMedID 20459839

  • Genomic binding profiles of functionally distinct RNA polymerase III transcription complexes in human cells NATURE STRUCTURAL & MOLECULAR BIOLOGY Moqtaderi, Z., Wang, J., Raha, D., White, R. J., Snyder, M., Weng, Z., Struhl, K. 2010; 17 (5): 635-U139

    Abstract

    Genome-wide occupancy profiles of five components of the RNA polymerase III (Pol III) machinery in human cells identified the expected tRNA and noncoding RNA targets and revealed many additional Pol III-associated loci, mostly near short interspersed elements (SINEs). Several genes are targets of an alternative transcription factor IIIB (TFIIIB) containing Brf2 instead of Brf1 and have extremely low levels of TFIIIC. Strikingly, expressed Pol III genes, unlike nonexpressed Pol III genes, are situated in regions with a pattern of histone modifications associated with functional Pol II promoters. TFIIIC alone associates with numerous ETC loci, via the B box or a novel motif. ETCs are often near CTCF binding sites, suggesting a potential role in chromosome organization. Our results suggest that human Pol III complexes associate preferentially with regions near functional Pol II promoters and that TFIIIC-mediated recruitment of TFIIIB is regulated in a locus-specific manner.

    View details for DOI 10.1038/nsmb.1794

    View details for Web of Science ID 000277330700020

    View details for PubMedID 20418883

    View details for PubMedCentralID PMC3350333

  • Genetic analysis of variation in transcription factor binding in yeast NATURE Zheng, W., Zhao, H., Mancera, E., Steinmetz, L. M., Snyder, M. 2010; 464 (7292): 1187-U106

    Abstract

    Variation in transcriptional regulation is thought to be a major cause of phenotypic diversity. Although widespread differences in gene expression among individuals of a species have been observed, studies to examine the variability of transcription factor binding on a global scale have not been performed, and thus the extent and underlying genetic basis of transcription factor binding diversity is unknown. By mapping differences in transcription factor binding among individuals, here we present the genetic basis of such variation on a genome-wide scale. Whole-genome Ste12-binding profiles were determined using chromatin immunoprecipitation coupled with DNA sequencing in pheromone-treated cells of 43 segregants of a cross between two highly diverged yeast strains and their parental lines. We identified extensive Ste12-binding variation among individuals, and mapped underlying cis- and trans-acting loci responsible for such variation. We showed that most transcription factor binding variation is cis-linked, and that many variations are associated with polymorphisms residing in the binding motifs of Ste12 as well as those of several proposed Ste12 cofactors. We also identified two trans-factors, AMN1 and FLO8, that modulate Ste12 binding to promoters of more than ten genes under alpha-factor treatment. Neither of these two genes was previously known to regulate Ste12, and we suggest that they may be mediators of gene activity and phenotypic diversity. Ste12 binding strongly correlates with gene expression for more than 200 genes, indicating that binding variation is functional. Many of the variable-bound genes are involved in cell wall organization and biogenesis. Overall, these studies identified genetic regulators of molecular diversity among individuals and provide new insights into mechanisms of gene regulation.

    View details for DOI 10.1038/nature08934

    View details for Web of Science ID 000276891100036

    View details for PubMedID 20237471

    View details for PubMedCentralID PMC2941147

  • Variation in Transcription Factor Binding Among Humans SCIENCE Kasowski, M., Grubert, F., Heffelfinger, C., Hariharan, M., Asabere, A., Waszak, S. M., Habegger, L., Rozowsky, J., Shi, M., Urban, A. E., Hong, M., Karczewski, K. J., Huber, W., Weissman, S. M., Gerstein, M. B., Korbel, J. O., Snyder, M. 2010; 328 (5975): 232-235

    Abstract

    Differences in gene expression may play a major role in speciation and phenotypic diversity. We examined genome-wide differences in transcription factor (TF) binding in several humans and a single chimpanzee by using chromatin immunoprecipitation followed by sequencing. The binding sites of RNA polymerase II (PolII) and a key regulator of immune responses, nuclear factor kappaB (p65), were mapped in 10 lymphoblastoid cell lines, and 25 and 7.5% of the respective binding regions were found to differ between individuals. Binding differences were frequently associated with single-nucleotide polymorphisms and genomic structural variants, and these differences were often correlated with differences in gene expression, suggesting functional consequences of binding variation. Furthermore, comparing PolII binding between humans and chimpanzee suggests extensive divergence in TF binding. Our results indicate that many differences in individuals and species occur at the level of TF binding, and they provide insight into the genetic events responsible for these differences.

    View details for DOI 10.1126/science.1183621

    View details for Web of Science ID 000276459600043

    View details for PubMedID 20299548

    View details for PubMedCentralID PMC2938768

  • Molecular Mechanisms of Ethanol-Induced Pathogenesis Revealed by RNA-Sequencing PLOS PATHOGENS Camarena, L., Bruno, V., Euskirchen, G., Poggio, S., Snyder, M. 2010; 6 (4)

    Abstract

    Acinetobacter baumannii is a common pathogen whose recent resistance to drugs has emerged as a major health problem. Ethanol has been found to increase the virulence of A. baumannii in Dictyostelium discoideum and Caenorhabditis elegans models of infection. To better understand the causes of this effect, we examined the transcriptional profile of A. baumannii grown in the presence or absence of ethanol using RNA-Seq. Using the Illumina/Solexa platform, a total of 43,453,960 reads (35 nt) were obtained, of which 3,596,474 mapped uniquely to the genome. Our analysis revealed that ethanol induces the expression of 49 genes that belong to different functional categories. A strong induction was observed for genes encoding metabolic enzymes, indicating that ethanol is efficiently assimilated. In addition, we detected the induction of genes encoding stress proteins, including upsA, hsp90, groEL and lon as well as permeases, efflux pumps and a secreted phospholipase C. In stationary phase, ethanol strongly induced several genes involved with iron assimilation and a high-affinity phosphate transport system, indicating that A. baumannii makes a better use of the iron and phosphate resources in the medium when ethanol is used as a carbon source. To evaluate the role of phospholipase C (Plc1) in virulence, we generated and analyzed a deletion mutant for plc1. This strain exhibits a modest, but reproducible, reduction in the cytotoxic effect caused by A. baumannii on epithelial cells, suggesting that phospholipase C is important for virulence. Overall, our results indicate the power of applying RNA-Seq to identify key modulators of bacterial pathogenesis. We suggest that the effect of ethanol on the virulence of A. baumannii is multifactorial and includes a general stress response and other specific components such as phospholipase C.

    View details for DOI 10.1371/journal.ppat.1000834

    View details for Web of Science ID 000277722400007

    View details for PubMedID 20368969

    View details for PubMedCentralID PMC2848557

  • Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wu, J. Q., Habegger, L., Noisa, P., Szekely, A., Qiu, C., Hutchison, S., Raha, D., Egholm, M., Lin, H., Weissman, S., Cui, W., Gerstein, M., Snyder, M. 2010; 107 (11): 5254-5259

    Abstract

    To examine the fundamental mechanisms governing neural differentiation, we analyzed the transcriptome changes that occur during the differentiation of hESCs into the neural lineage. Undifferentiated hESCs as well as cells at three stages of early neural differentiation-N1 (early initiation), N2 (neural progenitor), and N3 (early glial-like)-were analyzed using a combination of single read, paired-end read, and long read RNA sequencing. The results revealed enormous complexity in gene transcription and splicing dynamics during neural cell differentiation. We found previously unannotated transcripts and spliced isoforms specific for each stage of differentiation. Interestingly, splicing isoform diversity is highest in undifferentiated hESCs and decreases upon differentiation, a phenomenon we call isoform specialization. During neural differentiation, we observed differential expression of many types of genes, including those involved in key signaling pathways, and a large number of extracellular receptors exhibit stage-specific regulation. These results provide a valuable resource for studying neural differentiation and reveal insights into the mechanisms underlying in vitro neural differentiation of hESCs, such as neural fate specification, neural progenitor cell identity maintenance, and the transition from a predominantly neuronal state into one with increased gliogenic potential.

    View details for DOI 10.1073/pnas.0914114107

    View details for Web of Science ID 000275714300079

    View details for PubMedID 20194744

    View details for PubMedCentralID PMC2841935

  • Personal genome sequencing: current approaches and challenges GENES & DEVELOPMENT Snyder, M., Du, J., Gerstein, M. 2010; 24 (5): 423-431

    Abstract

    The revolution in DNA sequencing technologies has now made it feasible to determine the genome sequences of many individuals; i.e., "personal genomes." Genome sequences of cells and tissues from both normal and disease states have been determined. Using current approaches, whole human genome sequences are not typically assembled and determined de novo, but, instead, variations relative to a reference sequence are identified. We discuss the current state of personal genome sequencing, the main steps involved in determining a genome sequence (i.e., identifying single-nucleotide polymorphisms [SNPs] and structural variations [SVs], assembling new sequences, and phasing haplotypes), and the challenges and performance metrics for evaluating the accuracy of the reconstruction. Finally, we consider the possible individual and societal benefits of personal genome sequences.

    View details for DOI 10.1101/gad.1864110

    View details for Web of Science ID 000275055900001

    View details for PubMedID 20194435

    View details for PubMedCentralID PMC2827837

  • X chromosome-wide analyses of genomic DNA methylation states and gene expression in male and female neutrophils PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Yasukochi, Y., Maruyama, O., Mahajan, M. C., Padden, C., Euskirchen, G. M., Schulz, V., Hirakawa, H., Kuhara, S., Pan, X., Newburger, P. E., Snyder, M., Weissman, S. M. 2010; 107 (8): 3704-3709

    Abstract

    The DNA methylation status of human X chromosomes from male and female neutrophils was identified by high-throughput sequencing of HpaII and MspI digested fragments. In the intergenic and intragenic regions on the X chromosome, the sites outside CpG islands were heavily hypermethylated to the same degree in both genders. Nearly half of X chromosome promoters were either hypomethylated or hypermethylated in both females and males. Nearly one third of X chromosome promoters were a mixture of hypomethylated and heterogeneously methylated sites in females and were hypomethylated in males. Thus, a large fraction of genes that are silenced on the inactive X chromosome are hypomethylated in their promoter regions. These genes frequently belong to the evolutionarily younger strata of the X chromosome. The promoters that were hypomethylated at more than two sites contained most of the genes that escaped silencing on the inactive X chromosome. The overall levels of expression of X-linked genes were indistinguishable in females and males, regardless of the methylation state of the inactive X chromosome. Thus, in addition to DNA methylation, other factors are involved in the fine tuning of gene dosage compensation in neutrophils.

    View details for DOI 10.1073/pnas.0914812107

    View details for Web of Science ID 000275130900077

    View details for PubMedID 20133578

  • Close association of RNA polymerase II and many transcription factors with Pol III genes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Raha, D., Wang, Z., Moqtaderi, Z., Wu, L., Zhong, G., Gerstein, M., Struhl, K., Snyder, M. 2010; 107 (8): 3639-3644

    Abstract

    Transcription of the eukaryotic genomes is carried out by three distinct RNA polymerases I, II, and III, whereby each polymerase is thought to independently transcribe a distinct set of genes. To investigate a possible relationship of RNA polymerases II and III, we mapped their in vivo binding sites throughout the human genome by using ChIP-Seq in two different cell lines, GM12878 and K562 cells. Pol III was found to bind near many known genes as well as several previously unidentified target genes. RNA-Seq studies indicate that a majority of the bound genes are expressed, although a subset are not suggestive of stalling by RNA polymerase III. Pol II was found to bind near many known Pol III genes, including tRNA, U6, HVG, hY, 7SK and previously unidentified Pol III target genes. Similarly, in vivo binding studies also reveal that a number of transcription factors normally associated with Pol II transcription, including c-Fos, c-Jun and c-Myc, also tightly associate with most Pol III-transcribed genes. Inhibition of Pol II activity using alpha-amanitin reduced expression of a number of Pol III genes (e.g., U6, hY, HVG), suggesting that Pol II plays an important role in regulating their transcription. These results indicate that, contrary to previous expectations, polymerases can often work with one another to globally coordinate gene expression.

    View details for DOI 10.1073/pnas.0911315106

    View details for Web of Science ID 000275130900066

    View details for PubMedID 20139302

    View details for PubMedCentralID PMC2840497

  • Deciphering Protein Kinase Specificity Through Large-Scale Analysis of Yeast Phosphorylation Site Motifs SCIENCE SIGNALING Mok, J., Kim, P. M., Lam, H. Y., Piccirillo, S., Zhou, X., Jeschke, G. R., Sheridan, D. L., Parker, S. A., Desai, V., Jwa, M., Cameroni, E., Niu, H., Good, M., Remenyi, A., Ma, J. N., Sheu, Y., Sassi, H. E., Sopko, R., Chan, C. S., De Virgilio, C., Hollingsworth, N. M., Lim, W. A., Stern, D. F., Stillman, B., Andrews, B. J., Gerstein, M. B., Snyder, M., Turk, B. E. 2010; 3 (109)

    Abstract

    Phosphorylation is a universal mechanism for regulating cell behavior in eukaryotes. Although protein kinases target short linear sequence motifs on their substrates, the rules for kinase substrate recognition are not completely understood. We used a rapid peptide screening approach to determine consensus phosphorylation site motifs targeted by 61 of the 122 kinases in Saccharomyces cerevisiae. By correlating these motifs with kinase primary sequence, we uncovered previously unappreciated rules for determining specificity within the kinase family, including a residue determining P-3 arginine specificity among members of the CMGC [CDK (cyclin-dependent kinase), MAPK (mitogen-activated protein kinase), GSK (glycogen synthase kinase), and CDK-like] group of kinases. Furthermore, computational scanning of the yeast proteome enabled the prediction of thousands of new kinase-substrate relationships. We experimentally verified several candidate substrates of the Prk1 family of kinases in vitro and in vivo and identified a protein substrate of the kinase Vhs1. Together, these results elucidate how kinase catalytic domains recognize their phosphorylation targets and suggest general avenues for the identification of previously unknown kinase substrates across eukaryotes.

    View details for DOI 10.1126/scisignal.2000482

    View details for Web of Science ID 000275647900005

    View details for PubMedID 20159853

    View details for PubMedCentralID PMC2846625

  • Genome-Wide Identification of Binding Sites Defines Distinct Functions for Caenorhabditis elegans PHA-4/FOXA in Development and Environmental Response PLOS GENETICS Zhong, M., Niu, W., Lu, Z. J., Sarov, M., Murray, J. I., Janette, J., Raha, D., Sheaffer, K. L., Lam, H. Y., Preston, E., Slightham, C., Hillier, L. W., Brock, T., Agarwal, A., Auerbach, R., Hyman, A. A., Gerstein, M., Mango, S. E., Kim, S. K., Waterston, R. H., Reinke, V., Snyder, M. 2010; 6 (2)

    Abstract

    Transcription factors are key components of regulatory networks that control development, as well as the response to environmental stimuli. We have established an experimental pipeline in Caenorhabditis elegans that permits global identification of the binding sites for transcription factors using chromatin immunoprecipitation and deep sequencing. We describe and validate this strategy, and apply it to the transcription factor PHA-4, which plays critical roles in organ development and other cellular processes. We identified thousands of binding sites for PHA-4 during formation of the embryonic pharynx, and also found a role for this factor during the starvation response. Many binding sites were found to shift dramatically between embryos and starved larvae, from developmentally regulated genes to genes involved in metabolism. These results indicate distinct roles for this regulator in two different biological processes and demonstrate the versatility of transcription factors in mediating diverse biological roles.

    View details for DOI 10.1371/journal.pgen.1000848

    View details for Web of Science ID 000275262700016

    View details for PubMedID 20174564

    View details for PubMedCentralID PMC2824807

  • Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library NATURE BIOTECHNOLOGY Lam, H. Y., Mu, X. J., Stuetz, A. M., Tanzer, A., Cayting, P. D., Snyder, M., Kim, P. M., Korbel, J. O., Gerstein, M. B. 2010; 28 (1): 47-U76

    Abstract

    Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.

    View details for DOI 10.1038/nbt.1600

    View details for Web of Science ID 000273430400020

    View details for PubMedID 20037582

  • CHIP-SEQ: USING HIGH-THROUGHPUT DNA SEQUENCING FOR GENOME-WIDE IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES METHODS IN ENZYMOLOGY, VOL 470: GUIDE TO YEAST GENETICS: Lefrancois, P., Zheng, W., Snyder, M. 2010; 470: 77-104

    Abstract

    Much of eukaryotic gene regulation is mediated by binding of transcription factors near or within their target genes. Transcription factor binding sites (TFBS) are often identified globally using chromatin immunoprecipitation (ChIP) in which specific protein-DNA interactions are isolated using an antibody against the factor of interest. Coupling ChIP with high-throughput DNA sequencing allows identification of TFBS in a direct, unbiased fashion; this technique is termed ChIP-Sequencing (ChIP-Seq). In this chapter, we describe the yeast ChIP-Seq procedure, including the protocols for ChIP, input DNA preparation, and Illumina DNA sequencing library preparation. Descriptions of Illumina sequencing and data processing and analysis are also included. The use of multiplex short-read sequencing (i.e., barcoding) enables the analysis of many ChIP samples simultaneously, which is especially valuable for organisms with small genomes such as yeast.

    View details for DOI 10.1016/S0076-6879(10)70004-5

    View details for Web of Science ID 000275827900004

    View details for PubMedID 20946807

  • RNA-Seq: a method for comprehensive transcriptome analysis. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Nagalakshmi, U., Waern, K., Snyder, M. 2010; Chapter 4: Unit 4 11 1-13

    Abstract

    A recently developed technique called RNA Sequencing (RNA-Seq) uses massively parallel sequencing to allow transcriptome analyses of genomes at a far higher resolution than is available with Sanger sequencing- and microarray-based methods. In the RNA-Seq method, complementary DNAs (cDNAs) generated from the RNA of interest are directly sequenced using next-generation sequencing technologies. The reads obtained from this can then be aligned to a reference genome in order to construct a whole-genome transcriptome map. RNA-Seq has been used successfully to precisely quantify transcript levels, confirm or revise previously annotated 5' and 3' ends of genes, and map exon/intron boundaries. This unit describes protocols for performing RNA-Seq using the Illumina sequencing platform.

    View details for DOI 10.1002/0471142727.mb0411s89

    View details for PubMedID 20069539

  • Systems biology approaches to disease marker discovery DISEASE MARKERS Sharon, D., Chen, R., Snyder, M. 2010; 28 (4): 209-224

    Abstract

    Our understanding of human disease and potential therapeutics is improving rapidly. In order to take advantage of these developments it is important to be able to identify disease markers. Many new high-throughput genomics and proteomics technologies are being implemented to identify candidate disease markers. These technologies include protein microarrays, next-generation DNA sequencing and mass spectrometry platforms. Such methods are particularly important for elucidating the repertoire of molecular markers in the genome, transcriptome, proteome and metabolome of patients with diseases such as cancer, autoimmune diseases, and viral infections, resulting from the disruption of many biological pathways. These new technologies have identified many potential disease markers. These markers are expected to be valuable to achieve the promise of truly personalized medicine.

    View details for DOI 10.3233/DMA-2010-0707

    View details for Web of Science ID 000279321200003

    View details for PubMedID 20534906

  • EBNA1 regulates cellular gene expression by binding cellular promoters PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Canaan, A., Haviv, I., Urban, A. E., Schulz, V. P., Hartman, S., Zhang, Z., Palejev, D., Deisseroth, A. B., Lacy, J., Snyder, M., Gerstein, M., Weissman, S. M. 2009; 106 (52): 22421-22426

    Abstract

    Epstein-Barr virus (EBV) is associated with several types of lymphomas and epithelial tumors including Burkitt's lymphoma (BL), HIV-associated lymphoma, posttransplant lymphoproliferative disorder, and nasopharyngeal carcinoma. EBV nuclear antigen 1 (EBNA1) is expressed in all EBV associated tumors and is required for latency and transformation. EBNA1 initiates latent viral replication in B cells, maintains the viral genome copy number, and regulates transcription of other EBV-encoded latent genes. These activities are mediated through the ability of EBNA1 to bind viral-DNA. To further elucidate the role of EBNA1 in the host cell, we have examined the effect of EBNA1 on cellular gene expression by microarray analysis using the B cell BJAB and the epithelial 293 cell lines transfected with EBNA1. Analysis of the data revealed distinct profiles of cellular gene changes in BJAB and 293 cell lines. Subsequently, chromatin immune-precipitation revealed a direct binding of EBNA1 to cellular promoters. We have correlated EBNA1 bound promoters with changes in gene expression. Sequence analysis of the 100 promoters most enriched revealed a DNA motif that differs from the EBNA1 binding site in the EBV genome.

    View details for DOI 10.1073/pnas.0911676106

    View details for Web of Science ID 000273178700069

    View details for PubMedID 20080792

  • Mapping accessible chromatin regions using Sono-Seq PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Auerbach, R. K., Euskirchen, G., Rozowsky, J., Lamarre-Vincent, N., Moqtaderi, Z., Lefrancois, P., Struhl, K., Gerstein, M., Snyder, M. 2009; 106 (35): 14926-14931

    Abstract

    Disruptions in local chromatin structure often indicate features of biological interest such as regulatory regions. We find that sonication of cross-linked chromatin, when combined with a size-selection step and massively parallel short-read sequencing, can be used as a method (Sono-Seq) to map locations of high chromatin accessibility in promoter regions. Sono-Seq sites frequently correspond to actively transcribed promoter regions, as evidenced by their co-association with RNA Polymerase II ChIP regions, transcription start sites, histone H3 lysine 4 trimethylation (H3K4me3) marks, and CpG islands; signals over other sites, such as those bound by the CTCF insulator, are also observed. The pattern of breakage by Sono-Seq overlaps with, but is distinct from, that observed for FAIRE and DNase I hypersensitive sites. Our results demonstrate that Sono-Seq can be a useful and simple method by which to map many local alterations in chromatin structure. Furthermore, our results provide insights into the mapping of binding sites by using ChIP-Seq experiments and the value of reference samples that should be used in such experiments.

    View details for DOI 10.1073/pnas.0905443106

    View details for Web of Science ID 000269481000036

    View details for PubMedID 19706456

  • Global analysis of the glycoproteome in Saccharomyces cerevisiae reveals new roles for protein glycosylation in eukaryotes MOLECULAR SYSTEMS BIOLOGY Kung, L. A., Tao, S., Qian, J., Smith, M. G., Snyder, M., Zhu, H. 2009; 5

    Abstract

    To further understand the roles of protein glycosylation in eukaryotes, we globally identified glycan-containing proteins in yeast. A fluorescent lectin binding assay was developed and used to screen protein microarrays containing over 5000 proteins purified from yeast. A total of 534 yeast proteins were identified that bound either Concanavalin A (ConA) or Wheat-Germ Agglutinin (WGA); 406 of them were novel. Among the novel glycoproteins, 45 were validated by mobility shift upon treatment with EndoH and PNGase F, thereby extending the number of validated yeast glycoproteins to 350. In addition to many components of the secretory pathway, we identified other types of proteins, such as transcription factors and mitochondrial proteins. To further explore the role of glycosylation in mitochondrial function, the localization of four mitochondrial proteins was examined in the presence and absence of tunicamycin, an inhibitor of N-linked protein glycosylation. For two proteins, localization to the mitochondria is diminished upon tunicamycin treatment, indicating that protein glycosylation is important for protein function. Overall, our studies greatly extend our understanding of protein glycosylation in eukaryotes through the cataloguing of glycoproteins, and describe a novel role for protein glycosylation in mitochondrial protein function and localization.

    View details for DOI 10.1038/msb.2009.64

    View details for Web of Science ID 000270456400002

    View details for PubMedID 19756047

  • Impact of Chromatin Structures on DNA Processing for Genomic Analyses PLOS ONE Teytelman, L., Oezaydin, B., Zill, O., Lefrancois, P., Snyder, M., Rine, J., Eisen, M. B. 2009; 4 (8)

    Abstract

    Chromatin has an impact on recombination, repair, replication, and evolution of DNA. Here we report that chromatin structure also affects laboratory DNA manipulation in ways that distort the results of chromatin immunoprecipitation (ChIP) experiments. We initially discovered this effect at the Saccharomyces cerevisiae HMR locus, where we found that silenced chromatin was refractory to shearing, relative to euchromatin. Using input samples from ChIP-Seq studies, we detected a similar bias throughout the heterochromatic portions of the yeast genome. We also observed significant chromatin-related effects at telomeres, protein binding sites, and genes, reflected in the variation of input-Seq coverage. Experimental tests of candidate regions showed that chromatin influenced shearing at some loci, and that chromatin could also lead to enriched or depleted DNA levels in prepared samples, independently of shearing effects. Our results suggested that assays relying on immunoprecipitation of chromatin will be biased by intrinsic differences between regions packaged into different chromatin structures - biases which have been largely ignored to date. These results established the pervasiveness of this bias genome-wide, and suggested that this bias can be used to detect differences in chromatin structures across the genome.

    View details for DOI 10.1371/journal.pone.0006700

    View details for Web of Science ID 000269267400008

    View details for PubMedID 19693276

  • Intrinsic histone-DNA interactions are not the major determinant of nucleosome positions in vivo NATURE STRUCTURAL & MOLECULAR BIOLOGY Zhang, Y., Moqtaderi, Z., Rattner, B. P., Euskirchen, G., Snyder, M., Kadonaga, J. T., Liu, X. S., Struhl, K. 2009; 16 (8): 847-U70

    Abstract

    We assess the role of intrinsic histone-DNA interactions by mapping nucleosomes assembled in vitro on genomic DNA. Nucleosomes strongly prefer yeast DNA over Escherichia coli DNA, indicating that the yeast genome evolved to favor nucleosome formation. Many yeast promoter and terminator regions intrinsically disfavor nucleosome formation, and nucleosomes assembled in vitro show strong rotational positioning. Nucleosome arrays generated by the ACF assembly factor have fewer nucleosome-free regions, reduced rotational positioning and less translational positioning than obtained by intrinsic histone-DNA interactions. Notably, nucleosomes assembled in vitro have only a limited preference for specific translational positions and do not show the pattern observed in vivo. Our results argue against a genomic code for nucleosome positioning, and they suggest that the nucleosomal pattern in coding regions arises primarily from statistical positioning from a barrier near the promoter that involves some aspect of transcriptional initiation by RNA polymerase II.

    View details for DOI 10.1038/nsmb.1636

    View details for Web of Science ID 000268738700012

    View details for PubMedID 19620965

  • The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Korbel, J. O., Tirosh-Wagner, T., Urban, A. E., Chen, X., Kasowski, M., Dai, L., Grubert, F., Erdman, C., Gao, M. C., Lange, K., Sobel, E. M., Barlow, G. M., Aylsworth, A. S., Carpenter, N. J., Clark, R. D., Cohen, M. Y., Doran, E., Falik-Zaccai, T., Lewin, S. O., Lott, I. T., McGillivray, B. C., Moeschler, J. B., Pettenati, M. J., Pueschel, S. M., Rao, K. W., Shaffer, L. G., Shohat, M., Van Riper, A. J., Warburton, D., Weissman, S., Gerstein, M. B., Snyder, M., Korenberg, J. R. 2009; 106 (29): 12031-12036

    Abstract

    Down syndrome (DS), or trisomy 21, is a common disorder associated with several complex clinical phenotypes. Although several hypotheses have been put forward, it is unclear as to whether particular gene loci on chromosome 21 (HSA21) are sufficient to cause DS and its associated features. Here we present a high-resolution genetic map of DS phenotypes based on an analysis of 30 subjects carrying rare segmental trisomies of various regions of HSA21. By using state-of-the-art genomics technologies we mapped segmental trisomies at exon-level resolution and identified discrete regions of 1.8-16.3 Mb likely to be involved in the development of 8 DS phenotypes, 4 of which are congenital malformations, including acute megakaryocytic leukemia, transient myeloproliferative disorder, Hirschsprung disease, duodenal stenosis, imperforate anus, severe mental retardation, DS-Alzheimer Disease, and DS-specific congenital heart disease (DSCHD). Our DS-phenotypic maps located DSCHD to a <2-Mb interval. Furthermore, the map enabled us to present evidence against the necessary involvement of other loci as well as specific hypotheses that have been put forward in relation to the etiology of DS-i.e., the presence of a single DS consensus region and the sufficiency of DSCR1 and DYRK1A, or APP, in causing several severe DS phenotypes. Our study demonstrates the value of combining advanced genomics with cohorts of rare patients for studying DS, a prototype for the role of copy-number variation in complex disease.

    View details for DOI 10.1073/pnas.0813248106

    View details for Web of Science ID 000268178400040

    View details for PubMedID 19597142

  • Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants PLOS COMPUTATIONAL BIOLOGY Du, J., Bjornson, R. D., Zhang, Z. D., Kong, Y., Snyder, M., Gerstein, M. B. 2009; 5 (7)

    Abstract

    The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.

    View details for DOI 10.1371/journal.pcbi.1000432

    View details for Web of Science ID 000269220100023

    View details for PubMedID 19593373

  • Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: The Amsterdam Principles JOURNAL OF PROTEOME RESEARCH Rodriguez, H., Snyder, M., Uhlen, M., Andrews, P., Beavis, R., Borchers, C., Chalkley, R. J., Cho, S. Y., Cottingham, K., Dunn, M., Dylag, T., Edgar, R., Hare, P., Heck, A. J., Hirsch, R. F., Kennedy, K., Kolar, P., Kraus, H., Mallick, P., Nesvizhskii, A., Ping, P., Ponten, F., Yang, L., Yates, J. R., Stein, S. E., Hermjakob, H., Kinsinger, C. R., Apweiler, R. 2009; 8 (7): 3689-3692

    Abstract

    Policies supporting the rapid and open sharing of genomic data have directly fueled the accelerated pace of discovery in large-scale genomics research. The proteomics community is starting to implement analogous policies and infrastructure for making large-scale proteomics data widely available on a precompetitive basis. On August 14, 2008, the National Cancer Institute (NCI) convened the "International Summit on Proteomics Data Release and Sharing Policy" in Amsterdam, The Netherlands, to identify and address potential roadblocks to rapid and open access to data. The six principles agreed upon by key stakeholders at the summit addressed issues surrounding (1) timing, (2) comprehensiveness, (3) format, (4) deposition to repositories, (5) quality metrics, and (6) responsibility for proteomics data release. This summit report explores various approaches to develop a framework of data release and sharing principles that will most effectively fulfill the needs of the funding agencies and the research community.

    View details for DOI 10.1021/pr900023z

    View details for Web of Science ID 000267694600043

    View details for PubMedID 19344107

  • Unlocking the secrets of the genome NATURE Celniker, S. E., Dillon, L. A., Gerstein, M. B., Gunsalus, K. C., Henikoff, S., Karpen, G. H., Kellis, M., Lai, E. C., Lieb, J. D., MacAlpine, D. M., Micklem, G., Piano, F., Snyder, M., Stein, L., White, K. P., Waterston, R. H. 2009; 459 (7249): 927-930

    View details for DOI 10.1038/459927a

    View details for Web of Science ID 000267063500031

    View details for PubMedID 19536255

  • Dynamic and complex transcription factor binding during an inducible response in yeast GENES & DEVELOPMENT Ni, L., Bruce, C., Hart, C., Leigh-Bell, J., Gelperin, D., Umansky, L., Gerstein, M. B., Snyder, M. 2009; 23 (11): 1351-1363

    Abstract

    Complex biological processes are often regulated, at least in part, by the binding of transcription factors to their targets. Recently, considerable effort has been made to analyze the binding of relevant factors to the suite of targets they regulate, thereby generating a regulatory circuit map. However, for most studies the dynamics of binding have not been analyzed, and thus the temporal order of events and mechanisms by which this occurs are poorly understood. We globally analyzed in detail the temporal order of binding of several key factors involved in the salt response of yeast to their target genes. Analysis of Yap4 and Sko1 binding to their target genes revealed multiple temporal classes of binding patterns: (1) constant binding, (2) rapid induction, (3) slow induction, and (4) transient induction. These results demonstrate that individual transcription factors can have multiple binding patterns and help define the different types of temporal binding patterns used in eukaryotic gene regulation. To investigate these binding patterns further, we also analyzed the binding of seven other key transcription factors implicated in osmotic regulation, including Hot1, Msn1, Msn2, Msn4, Skn7, and Yap6, and found significant coassociation among the different factors at their gene targets. Moreover, the binding of several key factors was correlated with distinct classes of Yap4- and Sko1-binding patterns and with distinct types of genes. Gene expression studies revealed association of Yap4, Sko1, and other transcription factor-binding patterns with different gene expression patterns. The integration and analysis of binding and expression information reveals a complex dynamic and hierarchical circuit in which specific combinations of transcription factors target distinct sets of genes at discrete times to coordinate a rapid and important biological response.

    View details for DOI 10.1101/gad.1781909

    View details for Web of Science ID 000266524100009

    View details for PubMedID 19487574

  • Integrated analysis of co-expressed MAP kinase substrates in Arabidopsis thaliana. Plant signaling & behavior Popescu, S. C., Popescu, G. V., Snyder, M., Dinesh-Kumar, S. P. 2009; 4 (6): 524-527

    View details for PubMedID 19816141

  • Distinct Genomic Aberrations Associated with ERG Rearranged Prostate Cancer GENES CHROMOSOMES & CANCER Demichelis, F., Setlur, S. R., Beroukhim, R., Perner, S., Korbel, J. O., LaFargue, C. J., Pflueger, D., Pina, C., Hofer, M. D., Sboner, A., Svensson, M. A., Rickman, D. S., Urban, A., Snyder, M., Meyerson, M., Lee, C., Gerstein, M. B., Kuefer, R., Rubin, M. A. 2009; 48 (4): 366-380

    Abstract

    Emerging molecular and clinical data suggest that ETS fusion prostate cancer represents a distinct molecular subclass, driven most commonly by a hormonally regulated promoter and characterized by an aggressive natural history. The study of the genomic landscape of prostate cancer in the light of ETS fusion events is required to understand the foundation of this molecularly and clinically distinct subtype. We performed genome-wide profiling of 49 primary prostate cancers and identified 20 recurrent chromosomal copy number aberrations, mainly occurring as genomic losses. Co-occurring events included losses at 19q13.32 and 1p22.1. We discovered three genomic events associated with ERG rearranged prostate cancer, affecting 6q, 7q, and 16q. 6q loss in nonrearranged prostate cancer is accompanied by gene expression deregulation in an independent dataset and by protein deregulation of MYO6. To analyze copy number alterations within the ETS genes, we performed a comprehensive analysis of all 27 ETS genes and of the 3 Mbp genomic area between ERG and TMPRSS2 (21q) with an unprecedented resolution (30 bp). We demonstrate that high-resolution tiling arrays can be used to pin-point breakpoints leading to fusion events. This study provides further support to define a distinct molecular subtype of prostate cancer based on the presence of ETS gene rearrangements.

    View details for DOI 10.1002/gcc.20647

    View details for Web of Science ID 000263572700007

    View details for PubMedID 19156837

  • A myelopoiesis-associated regulatory intergenic noncoding RNA transcript within the human HOXA cluster BLOOD Zhang, X., Lian, Z., Padden, C., Gerstein, M. B., Rozowsky, J., Snyder, M., Gingeras, T. R., Kapranov, P., Weissman, S. M., Newburger, P. E. 2009; 113 (11): 2526-2534

    Abstract

    We have identified an intergenic transcriptional activity that is located between the human HOXA1 and HOXA2 genes, shows myeloid-specific expression, and is up-regulated during granulocytic differentiation. The novel gene, termed HOTAIRM1 (HOX antisense intergenic RNA myeloid 1), is transcribed antisense to the HOXA genes and originates from the same CpG island that embeds the start site of HOXA1. The transcript appears to be a noncoding RNA containing no long open-reading frame; sucrose gradient analysis shows no association with polyribosomal fractions. HOTAIRM1 is the most prominent intergenic transcript expressed and up-regulated during induced granulocytic differentiation of NB4 promyelocytic leukemia and normal human hematopoietic cells; its expression is specific to the myeloid lineage. Its induction during retinoic acid (RA)-driven granulocytic differentiation is through RA receptor and may depend on the expression of myeloid cell development factors targeted by RA signaling. Knockdown of HOTAIRM1 quantitatively blunted RA-induced expression of HOXA1 and HOXA4 during the myeloid differentiation of NB4 cells, and selectively attenuated induction of transcripts for the myeloid differentiation genes CD11b and CD18, but did not noticeably impact the more distal HOXA genes. These findings suggest that HOTAIRM1 plays a role in the myelopoiesis through modulation of gene expression in the HOXA cluster.

    View details for DOI 10.1182/blood-2008-06-162164

    View details for Web of Science ID 000264110600021

    View details for PubMedID 19144990

  • A high throughput embryonic stem cell screen identifies Oct-2 as a bifunctional regulator of neuronal differentiation GENES & DEVELOPMENT Theodorou, E., Dalembert, G., Heffelfinger, C., White, E., Weissman, S., Corcoran, L., Snyder, M. 2009; 23 (5): 575-588

    Abstract

    Neuronal differentiation is a complex process that involves a plethora of regulatory steps. To identify transcription factors that influence neuronal differentiation we developed a high throughput screen using embryonic stem (ES) cells. Seven-hundred human transcription factor clones were stably introduced into mouse ES (mES) cells and screened for their ability to induce neuronal differentiation of mES cells. Twenty-four factors that are capable of inducing neuronal differentiation were identified, including four known effectors of neuronal differentiation, 11 factors with limited evidence of involvement in regulating neuronal differentiation, and nine novel factors. One transcription factor, Oct-2, was studied in detail and found to be a bifunctional regulator: It can either repress or induce neuronal differentiation, depending on the particular isoform. Ectopic expression experiments demonstrate that isoform Oct-2.4 represses neuronal differentiation, whereas Oct-2.2 activates neuron formation. Consistent with a role in neuronal differentiation, Oct-2.2 expression is induced during differentiation, and cells depleted of Oct-2 and its homolog Oct-1 have a reduced capacity to differentiate into neurons. Our results reveal a number of transcription factors potentially important for mammalian neuronal differentiation, and indicate that Oct-2 may serve as a binary switch to repress differentiation in precursor cells and induce neuronal differentiation later during neuronal development.

    View details for DOI 10.1101/gad.1772509

    View details for Web of Science ID 000263918500005

    View details for PubMedID 19270158

  • Quantifying environmental adaptation of metabolic pathways in metagenomics PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Gianoulis, T. A., Raes, J., Patel, P. V., Bjornson, R., Korbel, J. O., Letunic, I., Yamada, T., Paccanaro, A., Jensen, L. J., Snyder, M., Bork, P., Gerstein, M. B. 2009; 106 (5): 1374-1379

    Abstract

    Recently, approaches have been developed to sample the genetic content of heterogeneous environments (metagenomics). However, by what means these sequences link distinct environmental conditions with specific biological processes is not well understood. Thus, a major challenge is how the usage of particular pathways and subnetworks reflects the adaptation of microbial communities across environments and habitats-i.e., how network dynamics relates to environmental features. Previous research has treated environments as discrete, somewhat simplified classes (e.g., terrestrial vs. marine), and searched for obvious metabolic differences among them (i.e., treating the analysis as a typical classification problem). However, environmental differences result from combinations of many factors, which often vary only slightly. Therefore, we introduce an approach that employs correlation and regression to relate multiple, continuously varying factors defining an environment to the extent of particular microbial pathways present in a geographic site. Moreover, rather than looking only at individual correlations (one-to-one), we adapted canonical correlation analysis and related techniques to define an ensemble of weighted pathways that maximally covaries with a combination of environmental variables (many-to-many), which we term a metabolic footprint. Applied to available aquatic datasets, we identified footprints predictive of their environment that can potentially be used as biosensors. For example, we show a strong multivariate correlation between the energy-conversion strategies of a community and multiple environmental gradients (e.g., temperature). Moreover, we identified covariation in amino acid transport and cofactor synthesis, suggesting that limiting amounts of cofactor can (partially) explain increased import of amino acids in nutrient-limited conditions.

    View details for DOI 10.1073/pnas.0808022106

    View details for Web of Science ID 000263074600018

    View details for PubMedID 19164758

  • Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing BMC GENOMICS Lefrancois, P., Euskirchen, G. M., Auerbach, R. K., Rozowsky, J., Gibson, T., Yellman, C. M., Gerstein, M., Snyder, M. 2009; 10

    Abstract

    Short-read high-throughput DNA sequencing technologies provide new tools to answer biological questions. However, high cost and low throughput limit their widespread use, particularly in organisms with smaller genomes such as S. cerevisiae. Although ChIP-Seq in mammalian cell lines is replacing array-based ChIP-chip as the standard for transcription factor binding studies, ChIP-Seq in yeast is still underutilized compared to ChIP-chip. We developed a multiplex barcoding system that allows simultaneous sequencing and analysis of multiple samples using Illumina's platform. We applied this method to analyze the chromosomal distributions of three yeast DNA binding proteins (Ste12, Cse4 and RNA PolII) and a reference sample (input DNA) in a single experiment and demonstrate its utility for rapid and accurate results at reduced costs.We developed a barcoding ChIP-Seq method for the concurrent analysis of transcription factor binding sites in yeast. Our multiplex strategy generated high quality data that was indistinguishable from data obtained with non-barcoded libraries. None of the barcoded adapters induced differences relative to a non-barcoded adapter when applied to the same DNA sample. We used this method to map the binding sites for Cse4, Ste12 and Pol II throughout the yeast genome and we found 148 binding targets for Cse4, 823 targets for Ste12 and 2508 targets for PolII. Cse4 was strongly bound to all yeast centromeres as expected and the remaining non-centromeric targets correspond to highly expressed genes in rich media. The presence of Cse4 non-centromeric binding sites was not reported previously.We designed a multiplex short-read DNA sequencing method to perform efficient ChIP-Seq in yeast and other small genome model organisms. This method produces accurate results with higher throughput and reduced cost. Given constant improvements in high-throughput sequencing technologies, increasing multiplexing will be possible to further decrease costs per sample and to accelerate the completion of large consortium projects such as modENCODE.

    View details for DOI 10.1186/1471-2164-10-37

    View details for Web of Science ID 000264970100002

    View details for PubMedID 19159457

  • Proteomic-Based Detection of a Protein Cluster Dysregulated during Cardiovascular Development Identifies Biomarkers of Congenital Heart Defects PLOS ONE Nath, A. K., Krauthammer, M., Li, P., Davidov, E., Butler, L. C., Copel, J., Katajamaa, M., Oresic, M., Buhimschi, I., Buhimschi, C., Snyder, M., Madri, J. A. 2009; 4 (1)

    Abstract

    Cardiovascular development is vital for embryonic survival and growth. Early gestation embryo loss or malformation has been linked to yolk sac vasculopathy and congenital heart defects (CHDs). However, the molecular pathways that underlie these structural defects in humans remain largely unknown hindering the development of molecular-based diagnostic tools and novel therapies.Murine embryos were exposed to high glucose, a condition known to induce cardiovascular defects in both animal models and humans. We further employed a mass spectrometry-based proteomics approach to identify proteins differentially expressed in embryos with defects from those with normal cardiovascular development. The proteins detected by mass spectrometry (WNT16, ST14, Pcsk1, Jumonji, Morca2a, TRPC5, and others) were validated by Western blotting and immunoflorescent staining of the yolk sac and heart. The proteins within the proteomic dataset clustered to adhesion/migration, differentiation, transport, and insulin signaling pathways. A functional role for several proteins (WNT16, ADAM15 and NOGO-A/B) was demonstrated in an ex vivo model of heart development. Additionally, a successful application of a cluster of protein biomarkers (WNT16, ST14 and Pcsk1) as a prenatal screen for CHDs was confirmed in a study of human amniotic fluid (AF) samples from women carrying normal fetuses and those with CHDs.The novel finding that WNT16, ST14 and Pcsk1 protein levels increase in fetuses with CHDs suggests that these proteins may play a role in the etiology of human CHDs. The information gained through this bed-side to bench translational approach contributes to a more complete understanding of the protein pathways dysregulated during cardiovascular development and provides novel avenues for diagnostic and therapeutic interventions, beneficial to fetuses at risk for CHDs.

    View details for DOI 10.1371/journal.pone.0004221

    View details for Web of Science ID 000265481900004

    View details for PubMedID 19156209

  • Three Distinct Condensin Complexes Control C. elegans Chromosome Dynamics CURRENT BIOLOGY Csankovszki, G., Collette, K., Spahl, K., Carey, J., Snyder, M., Petty, E., Patel, U., Tabuchi, T., Liu, H., Mcleod, I., Thompson, J., Sarkesik, A., Yates, J., Meyer, B. J., Hagstrom, K. 2009; 19 (1): 9-19

    Abstract

    Condensin complexes organize chromosome structure and facilitate chromosome segregation. Higher eukaryotes have two complexes, condensin I and condensin II, each essential for chromosome segregation. The nematode Caenorhabditis elegans was considered an exception, because it has a mitotic condensin II complex but appeared to lack mitotic condensin I. Instead, its condensin I-like complex (here called condensin I(DC)) dampens gene expression along hermaphrodite X chromosomes during dosage compensation.Here we report the discovery of a third condensin complex, condensin I, in C. elegans. We identify new condensin subunits and show that each complex has a conserved five-subunit composition. Condensin I differs from condensin I(DC) by only a single subunit. Yet condensin I binds to autosomes and X chromosomes in both sexes to promote chromosome segregation, whereas condensin I(DC) binds specifically to X chromosomes in hermaphrodites to regulate transcript levels. Both condensin I and II promote chromosome segregation, but associate with different chromosomal regions during mitosis and meiosis. Unexpectedly, condensin I also localizes to regions of cohesion between meiotic chromosomes before their segregation.We demonstrate that condensin subunits in C. elegans form three complexes, one that functions in dosage compensation and two that function in mitosis and meiosis. These results highlight how the duplication and divergence of condensin subunits during evolution may facilitate their adaptation to specialized chromosomal roles and illustrate the versatility of condensins to function in both gene regulation and chromosome segregation.

    View details for DOI 10.1016/j.cub.2008.12.006

    View details for Web of Science ID 000262584100022

    View details for PubMedID 19119011

  • PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data GENOME BIOLOGY Korbel, J. O., Abyzov, A., Mu, X. J., Carriero, N., Cayting, P., Zhang, Z., Snyder, M., Gerstein, M. B. 2009; 10 (2)

    Abstract

    Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.

    View details for DOI 10.1186/gb-2009-10-2-r23

    View details for Web of Science ID 000266345600020

    View details for PubMedID 19236709

  • RNA-Seq: a revolutionary tool for transcriptomics NATURE REVIEWS GENETICS Wang, Z., Gerstein, M., Snyder, M. 2009; 10 (1): 57-63

    Abstract

    RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

    View details for DOI 10.1038/nrg2484

    View details for Web of Science ID 000261866500012

    View details for PubMedID 19015660

  • MAPK target networks in Arabidopsis thaliana revealed using functional protein microarrays GENES & DEVELOPMENT Popescu, S. C., Popescu, G. V., Bachan, S., Zhang, Z., Gerstein, M., Snyder, M., Dinesh-Kumar, S. P. 2009; 23 (1): 80-92

    Abstract

    Signaling through mitogen-activated protein kinases (MPKs) cascades is a complex and fundamental process in eukaryotes, requiring MPK-activating kinases (MKKs) and MKK-activating kinases (MKKKs). However, to date only a limited number of MKK-MPK interactions and MPK phosphorylation substrates have been revealed. We determined which Arabidopsis thaliana MKKs preferentially activate 10 different MPKs in vivo and used the activated MPKs to probe high-density protein microarrays to determine their phosphorylation targets. Our analyses revealed known and novel signaling modules encompassing 570 MPK phosphorylation substrates; these substrates were enriched in transcription factors involved in the regulation of development, defense, and stress responses. Selected MPK substrates were validated by in planta reconstitution experiments. A subset of activated and wild-type MKKs induced cell death, indicating a possible role for these MKKs in the regulation of cell death. Interestingly, MKK7- and MKK9-induced death requires Sgt1, a known regulator of cell death induced during plant innate immunity. Our predicted MKK-MPK phosphorylation network constitutes a valuable resource to understand the function and specificity of MPK signaling systems.

    View details for DOI 10.1101/gad.1740009

    View details for Web of Science ID 000262369700008

    View details for PubMedID 19095804

  • PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls NATURE BIOTECHNOLOGY Rozowsky, J., Euskirchen, G., Auerbach, R. K., Zhang, Z. D., Gibson, T., Bjornson, R., Carriero, N., Snyder, M., Gerstein, M. B. 2009; 27 (1): 66-75

    Abstract

    Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

    View details for DOI 10.1038/nbt.1518

    View details for Web of Science ID 000262471200025

    View details for PubMedID 19122651

  • Protein microarrays. Methods in molecular biology (Clifton, N.J.) Fasolo, J., Snyder, M. 2009; 548: 209-222

    Abstract

    Protein microarrays containing nearly the entire yeast proteome have been constructed. They are typically prepared by overexpression and high-throughput purification and printing onto microscope slides. The arrays can be used to screen nearly the entire proteome in an unbiased fashion and have enormous utility for a variety of applications. These include protein-protein interactions, identification of novel lipid- and nucleic acid-binding proteins, and finding targets of small molecules, protein kinases, and other modification enzymes. Protein microarrays are thus powerful tools for individual studies as well as systematic characterization of proteins and their biochemical activities and regulation.

    View details for DOI 10.1007/978-1-59745-540-4_12

    View details for PubMedID 19521827

  • Global identification of protein kinase substrates by protein microarray analysis NATURE PROTOCOLS Mok, J., Im, H., Snyder, M. 2009; 4 (12): 1820-1827

    Abstract

    Herein, we describe a protocol for the global identification of in vitro substrates targeted by protein kinases using protein microarray technology. Large numbers of fusion proteins tagged at their carboxy-termini are purified in 96-well format and spotted in duplicate onto amino-silane-coated slides in a spatially addressable manner. These arrays are incubated in the presence of purified kinase and radiolabeled ATP, and then washed, dried and analyzed by autoradiography. The extent of phosphorylation of each spot is quantified and normalized, and proteins that are reproducibly phosphorylated in the presence of the active kinase relative to control slides are scored as positive substrates. This approach enables the rapid determination of kinase-substrate relationship on a proteome-wide scale, and although developed using yeast, has since been adapted to higher eukaryotic systems. Expression, purification and printing of the yeast proteome require about 3 weeks. Afterwards, each kinase assay takes approximately 3 h to perform.

    View details for DOI 10.1038/nprot.2009.194

    View details for Web of Science ID 000274226100011

    View details for PubMedID 20010933

    View details for PubMedCentralID PMC3382020

  • MSB: A mean-shift-based approach for the analysis of structural variation in the genome GENOME RESEARCH Wang, L., Abyzov, A., Korbel, J. O., Snyder, M., Gerstein, M. 2009; 19 (1): 106-117

    Abstract

    Genome structural variation includes segmental duplications, deletions, and other rearrangements, and array-based comparative genomic hybridization (array-CGH) is a popular technology for determining this. Drawing relevant conclusions from array-CGH requires computational methods for partitioning the chromosome into segments of elevated, reduced, or unchanged copy number. Several approaches have been described, most of which attempt to explicitly model the underlying distribution of data based on particular assumptions. Often, they optimize likelihood functions for estimating model parameters, by expectation maximization or related approaches; however, this requires good parameter initialization through prespecifying the number of segments. Moreover, convergence is difficult to achieve, since many parameters are required to characterize an experiment. To overcome these limitations, we propose a nonparametric method without a global criterion to be optimized. Our method involves mean-shift-based (MSB) procedures; it considers the observed array-CGH signal as sampling from a probability-density function, uses a kernel-based approach to estimate local gradients for this function, and iteratively follows them to determine local modes of the signal. Overall, our method achieves robust discontinuity-preserving smoothing, thus accurately segmenting chromosomes into regions of duplication and deletion. It does not require the number of segments as input, nor does its convergence depend on this. We successfully applied our method to both simulated data and array-CGH experiments on glioblastoma and adenocarcinoma. We show that it performs at least as well as, and often better than, 10 previously published algorithms. Finally, we show that our approach can be extended to segmenting the signal resulting from the depth-of-coverage of mapped reads from next-generation sequencing.

    View details for DOI 10.1101/gr.080069.108

    View details for Web of Science ID 000262200000010

    View details for PubMedID 19037015

  • Mismatch oligonucleotides in human and yeast: guidelines for probe design on tiling microarrays BMC GENOMICS Seringhaus, M., Rozowsky, J., Royce, T., Nagalakshmi, U., Jee, J., Snyder, M., Gerstein, M. 2008; 9

    Abstract

    Mismatched oligonucleotides are widely used on microarrays to differentiate specific from nonspecific hybridization. While many experiments rely on such oligos, the hybridization behavior of various degrees of mismatch (MM) structure has not been extensively studied. Here, we present the results of two large-scale microarray experiments on S. cerevisiae and H. sapiens genomic DNA, to explore MM oligonucleotide behavior with real sample mixtures under tiling-array conditions.We examined all possible nucleotide substitutions at the central position of 36-nucleotide probes, and found that nonspecific binding by MM oligos depends upon the individual nucleotide substitutions they incorporate: C-->A, C-->G and T-->A (yielding purine-purine mispairs) are most disruptive, whereas A-->X were least disruptive. We also quantify a marked GC skew effect: substitutions raising probe GC content exhibit higher intensity (and vice versa). This skew is small in highly-expressed regions (+/- 0.5% of total intensity range) and large (+/- 2% or more) elsewhere. Multiple mismatches per oligo are largely additive in effect: each MM added in a distributed fashion causes an additional 21% intensity drop relative to PM, three-fold more disruptive than adding adjacent mispairs (7% drop per MM).We investigate several parameters for oligonucleotide design, including the effects of each central nucleotide substitution on array signal intensity and of multiple MM per oligo. To avoid GC skew, individual substitutions should not alter probe GC content. RNA sample mixture complexity may increase the amount of nonspecific hybridization, magnify GC skew and boost the intensity of MM oligos at all levels.

    View details for DOI 10.1186/1471-2164-9-635

    View details for Web of Science ID 000264109200001

    View details for PubMedID 19117516

  • Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history GENOME RESEARCH Kim, P. M., Lam, H. Y., Urban, A. E., Korbel, J. O., Affourtit, J., Grubert, F., Chen, X., Weissman, S., Snyder, M., Gerstein, M. B. 2008; 18 (12): 1865-1874

    Abstract

    Segmental duplications (SDs) are operationally defined as >1 kb stretches of duplicated DNA with high sequence identity. They arise from copy number variants (CNVs) fixed in the population. To investigate the formation of SDs and CNVs, we examine their large-scale patterns of co-occurrence with different repeats. Alu elements, a major class of genomic repeats, had previously been identified as prime drivers of SD formation. We also observe this association; however, we find that it sharply decreases for younger SDs. Continuing this trend, we find only weak associations of CNVs with Alus. Similarly, we find an association of SDs with processed pseudogenes, which is decreasing for younger SDs and absent entirely for CNVs. Next, we find that SDs are significantly co-localized with each other, resulting in a highly skewed "power-law" distribution and chromosomal hotspots. We also observe a significant association of CNVs with SDs, but find that an SD-mediated mechanism only accounts for some CNVs (<28%). Overall, our results imply that a shift in predominant formation mechanism occurred in recent history: approximately 40 million years ago, during the "Alu burst" in retrotransposition activity, non-allelic homologous recombination, first mediated by Alus and then the by newly formed CNVs themselves, was the main driver of genome rearrangements; however, its relative importance has decreased markedly since then, with proportionally more events now stemming from other repeats and from non-homologous end-joining. In addition to a coarse-grained analysis, we performed targeted sequencing of 67 CNVs and then analyzed a combined set of 270 CNVs (540 breakpoints) to verify our conclusions.

    View details for DOI 10.1101/gr.081422.108

    View details for Web of Science ID 000261398900002

    View details for PubMedID 18842824

  • Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding GENOME RESEARCH Robertson, A. G., Bilenky, M., Tam, A., Zhao, Y., Zeng, T., Thiessen, N., Cezard, T., Fejes, A. P., Wederell, E. D., Cullum, R., Euskirchen, G., Krzywinski, M., Birol, I., Snyder, M., Hoodless, P. A., Hirst, M., Marra, M. A., Jones, S. J. 2008; 18 (12): 1906-1917

    Abstract

    We characterized the relationship of H3K4me1 and H3K4me3 at distal and proximal regulatory elements by comparing ChIP-seq profiles for these histone modifications and for two functionally different transcription factors: STAT1 in the immortalized HeLa S3 cell line, with and without interferon-gamma (IFNG) stimulation; and FOXA2 in mouse adult liver tissue. In unstimulated and stimulated HeLa cells, respectively, we determined approximately 270,000 and approximately 301,000 H3K4me1-enriched regions, and approximately 54,500 and approximately 76,100 H3K4me3-enriched regions. In mouse adult liver, we determined approximately 227,000 and approximately 34,800 H3K4me1 and H3K4me3 regions. Seventy-five percent of the approximately 70,300 STAT1 binding sites in stimulated HeLa cells and 87% of the approximately 11,000 FOXA2 sites in mouse liver were distal to known gene TSS; in both cell types, approximately 83% of these distal sites were associated with at least one of the two histone modifications, and H3K4me1 was associated with over 96% of marked distal sites. After filtering against predicted transcription start sites, 50% of approximately 26,800 marked distal IFNG-stimulated STAT1 binding sites, but 95% of approximately 5800 marked distal FOXA2 sites, were associated with H3K4me1 only. Results for HeLa cells generated additional insights into transcriptional regulation involving STAT1. STAT1 binding was associated with 25% of all H3K4me1 regions in stimulated HeLa cells, suggesting that a single transcription factor can interact with an unexpectedly large fraction of regulatory regions. Strikingly, for a large majority of the locations of stimulated STAT1 binding, the dominant H3K4me1/me3 combinations were established before activation, suggesting mechanisms independent of IFNG stimulation and high-affinity STAT1 binding.

    View details for DOI 10.1101/gr.078519.108

    View details for Web of Science ID 000261398900006

    View details for PubMedID 18787082

  • High-Resolution Copy-Number Variation Map Reflects Human Olfactory Receptor Diversity and Evolution PLOS GENETICS Hasin, Y., Olender, T., Khen, M., Gonzaga-Jauregui, C., Kim, P. M., Urban, A. E., Snyder, M., Gerstein, M. B., Lancet, D., Korbel, J. O. 2008; 4 (11)

    Abstract

    Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic content of OR genes is considerably reduced in humans, as reflected by the relatively small repertoire size and the high fraction ( approximately 55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that approximately 50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us to uncover nine common deletion alleles that affect 15 OR genes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences.

    View details for DOI 10.1371/journal.pgen.1000249

    View details for Web of Science ID 000261481000004

    View details for PubMedID 18989455

  • A procedure for highly specific, sensitive, and unbiased whole-genome amplification PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Pan, X., Urban, A. E., Palejev, D., Schulz, V., Grubert, F., Hu, Y., Snyder, M., Weissman, S. M. 2008; 105 (40): 15499-15504

    Abstract

    Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a method using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to subfemtograms of DNA. With an input of as little as 0.5-2.5 ng of human gDNA or a few cells, the product could be close to native DNA in locus representation. The amplicons from 5 and 0.5 ng of DNA faithfully demonstrated all previously known heterozygous segmental duplications and deletions (3 Mb to 18 kb) located on chromosome 22 and even a homozygous deletion smaller than 1 kb with high-resolution chromosome-wide comparative genomic hybridization. With 550k Infinium BeadChip SNP typing, the >99.7% accuracy was compared favorably with results on unamplified DNA. Importantly, underrepresentation of chromosome termini that occurred with GenomiPhi v2 was greatly rescued with the present procedure, and the call rate and accuracy of SNP typing were also improved for the amplicons with a 0.5-ng, partially degraded DNA input. In addition, the amplification proceeded logarithmically in terms of total yield before saturation; the intact cells was amplified >50 times more efficiently than an equivalent amount of extracted DNA; and the locus imbalance for amplicons with 0.1 ng or lower input of DNA was variable, whereas for higher input it was largely reproducible. This procedure facilitates genomic analysis with single cells or other traces of DNA, and generates products suitable for analysis by massively parallel sequencing as well as microarray hybridization.

    View details for DOI 10.1073/pnas.0808028105

    View details for Web of Science ID 000260360500052

    View details for PubMedID 18832167

  • High-quality binary protein interaction map of the yeast interactome network SCIENCE Yu, H., Braun, P., Yildirim, M. A., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane-Kishikawa, T., Gebreab, F., Li, N., Simonis, N., Hao, T., Rual, J., Dricot, A., Vazquez, A., Murray, R. R., Simon, C., Tardivo, L., Tam, S., Svrzikapa, N., Fan, C., De Smet, A., Motyl, A., Hudson, M. E., Park, J., Xin, X., Cusick, M. E., Moore, T., Boone, C., Snyder, M., Roth, F. P., Barabasi, A., Tavernier, J., Hill, D. E., Vidal, M. 2008; 322 (5898): 104-110

    Abstract

    Current yeast interactome network maps contain several hundred molecular complexes with limited and somewhat controversial representation of direct binary interactions. We carried out a comparative quality assessment of current yeast interactome data sets, demonstrating that high-throughput yeast two-hybrid (Y2H) screening provides high-quality binary interaction information. Because a large fraction of the yeast binary interactome remains to be mapped, we developed an empirically controlled mapping framework to produce a "second-generation" high-quality, high-throughput Y2H data set covering approximately 20% of all yeast binary interactions. Both Y2H and affinity purification followed by mass spectrometry (AP/MS) data are of equally high quality but of a fundamentally different and complementary nature, resulting in networks with different topological and biological properties. Compared to co-complex interactome models, this binary map is enriched for transient signaling interactions and intercomplex connections with a highly significant clustering between essential proteins. Rather than correlating with essentiality, protein connectivity correlates with genetic pleiotropy.

    View details for DOI 10.1126/science.1158684

    View details for Web of Science ID 000259680200048

    View details for PubMedID 18719252

  • Modeling ChIP Sequencing In Silico with Applications PLOS COMPUTATIONAL BIOLOGY Zhang, Z. D., Rozowsky, J., Snyder, M., Chang, J., Gerstein, M. 2008; 4 (8)

    Abstract

    ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion.

    View details for DOI 10.1371/journal.pcbi.1000158

    View details for Web of Science ID 000260041300021

    View details for PubMedID 18725927

  • A genomic analysis of RNA polymerase II modification and chromatin architecture related to 3 ' end RNA polyadenylation GENOME RESEARCH Lian, Z., Karpikov, A., Lian, J., Mahajan, M. C., Hartman, S., Gerstein, M., Snyder, M., Weissman, S. M. 2008; 18 (8): 1224-1237

    Abstract

    Genomic analyses have been applied extensively to analyze the process of transcription initiation in mammalian cells, but less to transcript 3' end formation and transcription termination. We used a novel approach to prepare 3' end fragments from polyadenylated RNA, and mapped the position of the poly(A) addition site using oligonucleotide arrays tiling 1% of the human genome. This approach revealed more 3' ends than had been annotated. The distribution of these ends relative to RNA polymerase II (PolII) and di- and trimethylated lysine 4 and lysine 36 of histone H3 was compared. A substantial fraction of unannotated 3' ends of RNA are intronic and antisense to the embedding gene. Poly(A) ends of annotated messages lie on average 2 kb upstream of the end of PolII binding (termination). Near the termination sites, and in some internal sites, unphosphorylated and C-terminal domain (CTD) serine 2 phosphorylated PolII (POLR2A) accumulate, suggesting pausing of the polymerase and perhaps dephosphorylation prior to release. Lysine 36 trimethylation occurs across transcribed genes, sometimes alternating with stretches of DNA in which lysine 36 dimethylation is more prominent. Lysine 36 methylation decreases at or near the site of polyadenylation, sometimes disappearing before disappearance of phosphorylated RNA PolII or release of PolII from DNA. Our results suggest that transcription termination loss of histone 3 lysine 36 methylation and later release of RNA polymerase. The latter is often associated with polymerase pausing. Overall, our study reveals extensive sites of poly(A) addition and provides insights into the events that occur during 3' end formation.

    View details for DOI 10.1101/gr.075804.107

    View details for Web of Science ID 000258116100004

    View details for PubMedID 18487515

  • Genome-Wide Occupancy of SREBP1 and Its Partners NFY and SP1 Reveals Novel Functional Roles and Combinatorial Regulation of Distinct Classes of Genes PLOS GENETICS Reed, B. D., Charos, A. E., Szekely, A. M., Weissman, S. M., Snyder, M. 2008; 4 (7)

    Abstract

    The sterol regulatory element-binding protein (SREBP) family member SREBP1 is a critical transcriptional regulator of cholesterol and fatty acid metabolism and has been implicated in insulin resistance, diabetes, and other diet-related diseases. We globally identified the promoters occupied by SREBP1 and its binding partners NFY and SP1 in a human hepatocyte cell line using chromatin immunoprecipitation combined with genome tiling arrays (ChIP-chip). We find that SREBP1 occupies the promoters of 1,141 target genes involved in diverse biological pathways, including novel targets with roles in lipid metabolism and insulin signaling. We also identify a conserved SREBP1 DNA-binding motif in SREBP1 target promoters, and we demonstrate that many SREBP1 target genes are transcriptionally activated by treatment with insulin and glucose using gene expression microarrays. Finally, we show that SREBP1 cooperates extensively with NFY and SP1 throughout the genome and that unique combinations of these factors target distinct functional pathways. Our results provide insight into the regulatory circuitry in which SREBP1 and its network partners coordinate a complex transcriptional response in the liver with cues from the diet.

    View details for DOI 10.1371/journal.pgen.1000133

    View details for Web of Science ID 000260410600005

    View details for PubMedID 18654640

  • The transcriptional landscape of the yeast genome defined by RNA sequencing SCIENCE Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M., Snyder, M. 2008; 320 (5881): 1344-1349

    Abstract

    The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.

    View details for DOI 10.1126/science.1158441

    View details for Web of Science ID 000256441100046

    View details for PubMedID 18451266

  • The current excitement about copy-number variation: how it relates to gene duplications and protein families CURRENT OPINION IN STRUCTURAL BIOLOGY Korbel, J. O., Kim, P. M., Chen, X., Urban, A. E., Weissman, S., Snyder, M., Gerstein, M. B. 2008; 18 (3): 366-374

    Abstract

    Following recent technological advances there has been an increasing interest in genome structural variants (SVs), in particular copy-number variants (CNVs)--large-scale duplications and deletions. Although not immediately evident, CNV surveys make a conceptual connection between the fields of population genetics and protein families, in particular with regard to the stability and expandability of families. The mechanisms giving rise to CNVs can be considered as fundamental processes underlying gene duplication and loss; duplicated genes being the results of 'successful' copies, fixed and maintained in the population. Conversely, many 'unsuccessful' duplicates remain in the genome as pseudogenes. Here, we survey studies on CNVs, highlighting issues related to protein families. In particular, CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes. Furthermore, CNVs occur more often at the periphery of the protein interaction network. In comparison, protein families associated with successful and unsuccessful duplicates are associated with similar functional categories but are differentially placed in the interaction network. These trends are likely reflective of CNV formation biases and natural selection, both of which differentially influence distinct protein families.

    View details for DOI 10.1016/j.sbi.2008.02.005

    View details for Web of Science ID 000257539100013

    View details for PubMedID 18511261

  • Leptin affects endocardial cushion formation by modulating EMT and migration via Akt signaling cascades JOURNAL OF CELL BIOLOGY Nath, A. K., Brown, R. M., Michaud, M., Sierra-Honigmann, M. R., Snyder, M., Madri, J. A. 2008; 181 (2): 367-380

    Abstract

    Blood circulation is dependent on heart valves to direct blood flow through the heart and great vessels. Valve development relies on epithelial to mesenchymal transition (EMT), a central feature of embryonic development and metastatic cancer. Abnormal EMT and remodeling contribute to the etiology of several congenital heart defects. Leptin and its receptor were detected in the mouse embryonic heart. Using an ex vivo model of cardiac EMT, the inhibition of leptin results in a signal transducer and activator of transcription 3 and Snail/vascular endothelial cadherin-independent decrease in EMT and migration. Our data suggest that an Akt signaling pathway underlies the observed phenotype. Furthermore, loss of leptin phenocopied the functional inhibition of alphavbeta3 integrin receptor and resulted in decreased alphavbeta3 integrin and matrix metalloprotease 2, suggesting that the leptin signaling pathway is involved in adhesion and migration processes. This study adds leptin to the repertoire of factors that mediate EMT and, for the first time, demonstrates a role for the interleukin 6 family in embryonic EMT.

    View details for Web of Science ID 000255410300018

    View details for PubMedID 18411306

  • Myo2p, a class V myosin in budding yeast, associates with a large ribonucleic acid-protein complex that contains mRNAs and subunits of the RNA-processing body RNA-A PUBLICATION OF THE RNA SOCIETY Chang, W., Zaarour, R. F., Reck-Peterson, S., Rinn, J., Singer, R. H., Snyder, M., Novick, P., Mooseker, M. S. 2008; 14 (3): 491-502

    Abstract

    Myo2p is an essential class V myosin in budding yeast with several identified functions in organelle trafficking and spindle orientation. The present study demonstrates that Myo2p is a component of a large RNA-containing complex (Myo2p-RNP) that is distinct from polysomes based on sedimentation analysis and lack of ribosomal subunits in the Myo2p-RNP. Microarray analysis of RNAs that coimmunoprecipitate with Myo2p revealed the presence of a large number of mRNAs in this complex. The Myo2p-RNA complex is in part composed of the RNA processing body (P-body) based on coprecipitation with P-body protein subunits and partial colocalization of Myo2p with P-bodies. P-body disassembly is delayed in the motor mutant, myo2-66, indicating that Myo2p may facilitate the release of mRNAs from the P-body.

    View details for DOI 10.1261/rna.665008

    View details for Web of Science ID 000253565400012

    View details for PubMedID 18218704

  • Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome GENOME BIOLOGY QianWu, J., Du, J., Rozowsky, J., Zhang, Z., Urban, A. E., Euskirchen, G., Weissman, S., Gerstein, M., Snyder, M. 2008; 9 (1)

    Abstract

    Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced.We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins.We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.

    View details for DOI 10.1186/gb-2008-9-1-r3

    View details for Web of Science ID 000253779800011

    View details for PubMedID 18173853

  • The development of protein microarrays and their applications in DNA-protein and protein-protein interaction analyses of Arabidopsis transcription factors MOLECULAR PLANT Gong, W., He, K., Covington, M., Dinesh-Kumar, S. P., Snyder, M., Harmer, S. L., Zhu, Y., Deng, X. W. 2008; 1 (1): 27-41

    Abstract

    We used our collection of Arabidopsis transcription factor (TF) ORFeome clones to construct protein microarrays containing as many as 802 TF proteins. These protein microarrays were used for both protein-DNA and protein-protein interaction analyses. For protein-DNA interaction studies, we examined AP2/ERF family TFs and their cognate cis-elements. By careful comparison of the DNA-binding specificity of 13 TFs on the protein microarray with previous non-microarray data, we showed that protein microarrays provide an efficient and high throughput tool for genome-wide analysis of TF-DNA interactions. This microarray protein-DNA interaction analysis allowed us to derive a comprehensive view of DNA-binding profiles of AP2/ERF family proteins in Arabidopsis. It also revealed four TFs that bound the EE (evening element) and had the expected phased gene expression under clock-regulation, thus providing a basis for further functional analysis of their roles in clock regulation of gene expression. We also developed procedures for detecting protein interactions using this TF protein microarray and discovered four novel partners that interact with HY5, which can be validated by yeast two-hybrid assays. Thus, plant TF protein microarrays offer an attractive high-throughput alternative to traditional techniques for TF functional characterization on a global scale.

    View details for DOI 10.1093/mp/ssm009

    View details for Web of Science ID 000259068900005

    View details for PubMedID 19802365

  • RNA polymerase II stalling: loading at the start prepares genes for a sprint GENOME BIOLOGY Wu, J. Q., Snyder, M. 2008; 9 (5)

    Abstract

    Stalling of RNA polymerase II near the promoter has recently been found to be much more common than previously thought. Genome-wide surveys of the phenomenon suggest that it is likely to be a rate-limiting control on gene activation that poises developmental and stimulus-responsive genes for prompt expression when inducing signals are received.

    View details for DOI 10.1186/gb-2008-9-5-220

    View details for Web of Science ID 000257564800002

    View details for PubMedID 18466645

  • Identification of differentially expressed proteins in ovarian cancer using high-density protein microarrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Hudson, M. E., Pozdnyakova, I., Haines, K., Mor, G., Snyder, M. 2007; 104 (44): 17494-17499

    Abstract

    Ovarian cancer is a leading cause of deaths, yet many aspects of the biology of the disease and a routine means of its detection are lacking. We have used protein microarrays and autoantibodies from cancer patients to identify proteins that are aberrantly expressed in ovarian tissue. Sera from 30 cancer patients and 30 healthy individuals were used to probe microarrays containing 5,005 human proteins. Ninety-four antigens were identified that exhibited enhanced reactivity from sera in cancer patients relative to control sera. The differential reactivity of four antigens was tested by using immunoblot analysis and tissue microarrays. Lamin A/C, SSRP1, and RALBP1 were found to exhibit increased expression in the cancer tissue relative to controls. The combined signals from multiple antigens proved to be a robust test to identify cancerous ovarian tissue. These antigens were also reactive with tissue from other types of cancer and thus are not specific to ovarian cancer. Overall our studies identified candidate tissue marker proteins for ovarian cancer and demonstrate that protein microarrays provide a powerful approach to identify proteins aberrantly expressed in disease states.

    View details for DOI 10.1073/pnas.0708572104

    View details for Web of Science ID 000250638400048

    View details for PubMedID 17954908

  • Paired-end mapping reveals extensive structural variation in the human genome SCIENCE Korbel, J. O., Urban, A. E., Affourtit, J. P., Godwin, B., Grubert, F., Simons, J. F., Kim, P. M., Palejev, D., Carriero, N. J., Du, L., Taillon, B. E., Chen, Z., Tanzer, A., Saunders, A. C., Chi, J., Yang, F., Carter, N. P., Hurles, M. E., Weissman, S. M., Harkins, T. T., Gerstein, M. B., Egholm, M., Snyder, M. 2007; 318 (5849): 420-426

    Abstract

    Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.

    View details for DOI 10.1126/science.1149504

    View details for Web of Science ID 000250230400038

    View details for PubMedID 17901297

  • Transcription factor binding site identification in yeast: a comparison of high-density oligonucleotide and PCR-based microarray platforms FUNCTIONAL & INTEGRATIVE GENOMICS Borneman, A. R., Zhang, Z. D., Rozowsky, J., Seringhaus, M. R., Gerstein, M., Snyder, M. 2007; 7 (4): 335-345

    Abstract

    In recent years, techniques have been developed to map transcription factor binding sites using chromatin immunoprecipitation combined with DNA microarrays (chIP chip). Initially, polymerase chain reaction (PCR)-based DNA arrays were used for the chIP chip procedure, however, high-density oligonucleotide (HDO) arrays, which allow for the production of thousands more features per array, have emerged as a competing array platform. To compare the two platforms, data from chIP chip analysis performed for three factors (Tec1, Ste12, and Sok2) using both HDO and PCR arrays under identical experimental conditions were compared. HDO arrays provided increased reproducibility and sensitivity, detecting approximately three times more binding events than the PCR arrays while also showing increased accuracy. The increased resolution provided by the HDO arrays also allowed for the identification of multiple binding peaks in close proximity and of novel binding events such as binding within ORFs. The HDO array platform provides a far more robust array system by all measures than PCR-based arrays, all of which is directly attributable to the large number of probes available.

    View details for DOI 10.1007/s10142-007-0054-7

    View details for Web of Science ID 000249808300006

    View details for PubMedID 17638031

  • Arabidopsis protein microarrays for the high-throughput identification of protein-protein interactions. Plant signaling & behavior Popescu, S. C., Snyder, M., Dinesh-Kumar, S. 2007; 2 (5): 416-420

    Abstract

    Protein microarray technology has emerged as a powerful new approach for the study of thousands of proteins simultaneously. Protein microarrays have been used for a wide variety of applications for the human and yeast systems. In a recent study, we demonstrated that Arabidopsis functional protein microarrays can be generated and employed to characterize the function of plant proteins. The arrayed proteins were produced using an optimized large-scale plant-based expression system. In a proof-of concept study, 173 known and novel potential substrates of calmodulin (CaM) and calmodulin-like proteins (CML) were identified in an unbiased and high-throughput manner. The information documented here on novel potential CaM targets provides new testable hypotheses in the area of CaM/Ca(2+)-regulated processes and represents a resource of functional information for the scientific community.

    View details for PubMedID 19704619

  • Divergence of transcription factor binding sites across related yeast species SCIENCE Borneman, A. R., Gianoulis, T. A., Zhang, Z. D., Yu, H., Rozowsky, J., Seringhaus, M. R., Wang, L. Y., Gerstein, M., Snyder, M. 2007; 317 (5839): 815-819

    Abstract

    Characterization of interspecies differences in gene regulation is crucial for understanding the molecular basis of both phenotypic diversity and evolution. By means of chromatin immunoprecipitation and DNA microarray analysis, the divergence in the binding sites of the pseudohyphal regulators Ste12 and Tec1 was determined in the yeasts Saccharomyces cerevisiae, S. mikatae, and S. bayanus under pseudohyphal conditions. We have shown that most of these sites have diverged across these species, far exceeding the interspecies variation in orthologous genes. A group of Ste12 targets was shown to be bound only in S. mikatae and S. bayanus under pseudohyphal conditions. Many of these genes are targets of Ste12 during mating in S. cerevisiae, indicating that specialization between the two pathways has occurred in this species. Transcription factor binding sites have therefore diverged substantially faster than ortholog content. Thus, gene regulation resulting from transcription factor binding is likely to be a major cause of divergence between related species.

    View details for DOI 10.1126/science.1140748

    View details for Web of Science ID 000248624500044

    View details for PubMedID 17690298

  • Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project NATURE Birney, E., Stamatoyannopoulos, J. A., Dutta, A., Guigo, R., Gingeras, T. R., Margulies, E. H., Weng, Z., Snyder, M., Dermitzakis, E. T., Stamatoyannopoulos, J. A., Thurman, R. E., Kuehn, M. S., Taylor, C. M., Neph, S., Koch, C. M., Asthana, S., Malhotra, A., Adzhubei, I., Greenbaum, J. A., Andrews, R. M., Flicek, P., Boyle, P. J., Cao, H., Carter, N. P., Clelland, G. K., Davis, S., Day, N., Dhami, P., Dillon, S. C., Dorschner, M. O., Fiegler, H., Giresi, P. G., Goldy, J., Hawrylycz, M., Haydock, A., Humbert, R., James, K. D., Johnson, B. E., Johnson, E. M., Frum, T. T., Rosenzweig, E. R., Karnani, N., Lee, K., Lefebvre, G. C., Navas, P. A., Neri, F., Parker, S. C., Sabo, P. J., Sandstrom, R., Shafer, A., Vetrie, D., Weaver, M., Wilcox, S., Yu, M., Collins, F. S., Dekker, J., Lieb, J. D., Tullius, T. D., Crawford, G. E., Sunyaev, S., Noble, W. S., Dunham, I., Dutta, A., Guigo, R., Denoeud, F., Reymond, A., Kapranov, P., Rozowsky, J., Zheng, D., Castelo, R., Frankish, A., Harrow, J., Ghosh, S., Sandelin, A., Hofacker, I. L., Baertsch, R., Keefe, D., Flicek, P., Dike, S., Cheng, J., Hirsch, H. A., Sekinger, E. A., Lagarde, J., Abril, J. F., Shahab, A., Flamm, C., Fried, C., Hackermueller, J., Hertel, J., Lindemeyer, M., Missal, K., Tanzer, A., Washietl, S., Korbel, J., Emanuelsson, O., Pedersen, J. S., Holroyd, N., Taylor, R., Swarbreck, D., Matthews, N., Dickson, M. C., Thomas, D. J., Weirauch, M. T., Gilbert, J., Drenkow, J., Bell, I., Zhao, X., Srinivasan, K. G., Sung, W., Ooi, H. S., Chiu, K. P., Foissac, S., Alioto, T., Brent, M., Pachter, L., Tress, M. L., Valencia, A., Choo, S. W., Choo, C. Y., Ucla, C., Manzano, C., Wyss, C., Cheung, E., Clark, T. G., Brown, J. B., Ganesh, M., Patel, S., Tammana, H., Chrast, J., Henrichsen, C. N., Kai, C., Kawai, J., Nagalakshmi, U., Wu, J., Lian, Z., Lian, J., Newburger, P., Zhang, X., Bickel, P., Mattick, J. S., Carninci, P., Hayashizaki, Y., Weissman, S., Dermitzakis, E. T., Margulies, E. H., Hubbard, T., Myers, R. M., Rogers, J., Stadler, P. F., Lowe, T. M., Wei, C., Ruan, Y., Snyder, M., Birney, E., Struhl, K., Gerstein, M., Antonarakis, S. E., Gingeras, T. R., Brown, J. B., Flicek, P., Fu, Y., Keefe, D., Birney, E., Denoeud, F., Gerstein, M., Green, E. D., Kapranov, P., Karaoez, U., Myers, R. M., Noble, W. S., Reymond, A., Rozowsky, J., Struhl, K., Siepel, A., Stamatoyannopoulos, J. A., Taylor, C. M., Taylor, J., Thurman, R. E., Tullius, T. D., Washietl, S., Zheng, D., Liefer, L. A., Wetterstrand, K. A., Good, P. J., Feingold, E. A., Guyer, M. S., Collins, F. S., Margulies, E. H., Cooper, G. M., Asimenos, G., Thomas, D. J., Dewey, C. N., Siepel, A., Birney, E., Keefe, D., Hou, M., Taylor, J., Nikolaev, S., Montoya-Burgos, J. I., Loeytynoja, A., Whelan, S., Pardi, F., Massingham, T., Brown, J. B., Huang, H., Zhang, N. R., Bickel, P., Holmes, I., Mullikin, J. C., Ureta-Vidal, A., Paten, B., Seringhaus, M., Church, D., Rosenbloom, K., Kent, W. J., Stone, E. A., Gerstein, M., Antonarakis, S. E., Batzoglou, S., Goldman, N., Hardison, R. C., Haussler, D., Miller, W., Pachter, L., Green, E. D., Sidow, A., Weng, Z., Trinklein, N. D., Fu, Y., Zhang, Z. D., Karaoez, U., Barrera, L., Stuart, R., Zheng, D., Ghosh, S., Flicek, P., King, D. C., Taylor, J., Ameur, A., Enroth, S., Bieda, M. C., Koch, C. M., Hirsch, H. A., Wei, C., Cheng, J., Kim, J., Bhinge, A. A., Giresi, P. G., Jiang, N., Liu, J., Yao, F., Sung, W., Chiu, K. P., Vega, V. B., Lee, C. W., Ng, P., Shahab, A., Sekinger, E. A., Yang, A., Moqtaderi, Z., Zhu, Z., Xu, X., Squazzo, S., Oberley, M. J., Inman, D., Singer, M. A., Richmond, T. A., Munn, K. J., Rada-Iglesias, A., Wallerman, O., Komorowski, J., Clelland, G. K., Wilcox, S., Dillon, S. C., Andrews, R. M., Fowler, J. C., Couttet, P., James, K. D., Lefebvre, G. C., Bruce, A. W., Dovey, O. M., Ellis, P. D., Dhami, P., Langford, C. F., Carter, N. P., Vetrie, D., Kapranov, P., Nix, D. A., Bell, I., Patel, S., Rozowsky, J., Euskirchen, G., Hartman, S., Lian, J., Wu, J., Urban, A. E., Kraus, P., Van Calcar, S., Heintzman, N., Kim, T. H., Wang, K., Qu, C., Hon, G., Luna, R., Glass, C. K., Rosenfeld, M. G., Force Aldred, S., Cooper, S. J., Halees, A., Lin, J. M., Shulha, H. P., Zhang, X., Xu, M., Haidar, J. N., Yu, Y., Birney, E., Weissman, S., Ruan, Y., Lieb, J. D., Iyer, V. R., Green, R. D., Gingeras, T. R., Wadelius, C., Dunham, I., Struhl, K., Hardison, R. C., Gerstein, M., Farnham, P. J., Myers, R. M., Ren, B., Snyder, M., Thomas, D. J., Rosenbloom, K., Harte, R. A., Hinrichs, A. S., Trumbower, H., Clawson, H., Hillman-Jackson, J., Zweig, A. S., Smith, K., Thakkapallayil, A., Barber, G., Kuhn, R. M., Karolchik, D., Haussler, D., Kent, W. J., Dermitzakis, E. T., Armengol, L., Bird, C. P., Clark, T. G., Cooper, G. M., de Bakker, P. I., Kern, A. D., Lopez-Bigas, N., Martin, J. D., Stranger, B. E., Thomas, D. J., Woodroffe, A., Batzoglou, S., Davydov, E., Dimas, A., Eyras, E., Hallgrimsdottir, I. B., Hardison, R. C., Huppert, J., Sidow, A., Taylor, J., Trumbower, H., Zody, M. C., Guigo, R., Mullikin, J. C., Abecasis, G. R., Estivill, X., Birney, E., Bouffard, G. G., Guan, X., Hansen, N. F., Idol, J. R., Maduro, V. V., Maskeri, B., McDowell, J. C., Park, M., Thomas, P. J., Young, A. C., Blakesley, R. W., Muzny, D. M., Sodergren, E., Wheeler, D. A., Worley, K. C., Jiang, H., Weinstock, G. M., Gibbs, R. A., Graves, T., Fulton, R., Mardis, E. R., Wilson, R. K., Clamp, M., Cuff, J., Gnerre, S., Jaffe, D. B., Chang, J. L., Lindblad-Toh, K., Lander, E. S., Koriabine, M., Nefedov, M., Osoegawa, K., Yoshinaga, Y., Zhu, B., de Jong, P. J. 2007; 447 (7146): 799-816

    Abstract

    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

    View details for DOI 10.1038/nature05874

    View details for Web of Science ID 000247207500034

    View details for PubMedID 17571346

    View details for PubMedCentralID PMC2212820

  • Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Korbel, J. O., Urban, A. E., Grubert, F., Du, J., Royce, T. E., Starr, P., Zhong, G., Emanuel, B. S., Weissman, S. M., Snyder, M., Gerstein, M. B. 2007; 104 (24): 10110-10115

    Abstract

    Copy-number variants (CNVs) are an abundant form of genetic variation in humans. However, approaches for determining exact CNV breakpoint sequences (physical deletion or duplication boundaries) across individuals, crucial for associating genotype to phenotype, have been lacking so far, and the vast majority of CNVs have been reported with approximate genomic coordinates only. Here, we report an approach, called BreakPtr, for fine-mapping CNVs (available from http://breakptr.gersteinlab.org). We statistically integrate both sequence characteristics and data from high-resolution comparative genome hybridization experiments in a discrete-valued, bivariate hidden Markov model. Incorporation of nucleotide-sequence information allows us to take into account the fact that recently duplicated sequences (e.g., segmental duplications) often coincide with breakpoints. In anticipation of an upcoming increase in CNV data, we developed an iterative, "active" approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10. Using our approach, we accurately mapped >400 breakpoints on chromosome 22 and a region of chromosome 11, refining the boundaries of many previously approximately mapped CNVs. Four predicted breakpoints flanked known disease-associated deletions. We validated an additional four predicted CNV breakpoints by sequencing. Overall, our results suggest a predictive resolution of approximately 300 bp. This level of resolution enables more precise correlations between CNVs and across individuals than previously possible, allowing the study of CNV population frequencies. Further, it enabled us to demonstrate a clear Mendelian pattern of inheritance for one of the CNVs.

    View details for DOI 10.1073/pnas.0703834104

    View details for Web of Science ID 000247363000036

    View details for PubMedID 17551006

  • Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome GENOME RESEARCH Emanuelsson, O., Nagalakshmi, U., Zheng, D., Rozowsky, J. S., Urban, A. E., Du, J., Lian, Z., Stolc, V., Weissman, S., Snyder, M., Gerstein, M. B. 2007; 17 (6): 886-897

    Abstract

    Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated with each tiling experiment (e.g., experimental protocols and genomic tiling density). Here, we assess the role of these parameters as they are manifest in different tiling-array platforms used for transcription mapping. First, we analyze how a number of published tiling-array experiments agree with established gene annotation on human chromosome 22. We observe that the transcription detected from high-density arrays correlates substantially better with annotation than that from other array types. Next, we analyze the transcription-mapping performance of the two main high-density oligonucleotide array platforms in the ENCODE regions of the human genome. We hybridize identical biological samples and develop several ways of scoring the arrays and segmenting the genome into transcribed and nontranscribed regions, with the aim of making the platforms most comparable to each other. Finally, we develop a platform comparison approach based on agreement with known annotation. Overall, we find that the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches. While we do find significant differences in the performance of the two high-density platforms, we also find that they complement each other to some extent. Finally, our experiments reveal a significant amount of novel transcription outside of known genes, and an appreciable sample of this was validated by independent experiments.

    View details for DOI 10.1101/gr.5014606

    View details for Web of Science ID 000247226900020

    View details for PubMedID 17119069

  • Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE) GENOME RESEARCH Bhinge, A. A., Kim, J., Euskirchen, G. M., Snyder, M., Iyer, V. R. 2007; 17 (6): 910-916

    Abstract

    Identifying the genome-wide binding sites of transcription factors is important in deciphering transcriptional regulatory networks. ChIP-chip (Chromatin immunoprecipitation combined with microarrays) has been widely used to map transcription factor binding sites in the human genome. However, whole genome ChIP-chip analysis is still technically challenging in vertebrates. We recently developed STAGE as an unbiased method for identifying transcription factor binding sites in the genome. STAGE is conceptually based on SAGE, except that the input is ChIP-enriched DNA. In this study, we implemented an improved sequencing strategy and analysis methods and applied STAGE to map the genomic binding profile of the transcription factor STAT1 after interferon treatment. STAT1 is mainly responsible for mediating the cellular responses to interferons, such as cell proliferation, apoptosis, immune surveillance, and immune responses. We present novel algorithms for STAGE tag analysis to identify enriched loci with high specificity, as verified by quantitative ChIP. STAGE identified several previously unknown STAT1 target genes, many of which are involved in mediating the response to interferon-gamma signaling. STAGE is thus a viable method for identifying the chromosomal targets of transcription factors and generating meaningful biological hypotheses that further our understanding of transcriptional regulatory networks.

    View details for DOI 10.1101/gr.5574907

    View details for Web of Science ID 000247226900022

    View details for PubMedID 17568006

  • Structured RNAs in the ENCODE selected regions of the human genome GENOME RESEARCH Washietl, S., Pedersen, J. S., Korbel, J. O., Stocsits, C., Gruber, A. R., Hackermueller, J., Hertel, J., Lindemeyer, M., Reiche, K., Tanzer, A., Ucla, C., Wyss, C., Antonarakis, S. E., Denoeud, F., Lagarde, J., Drenkow, J., Kapranov, P., Gingeras, T. R., Guigo, R., Snyder, M., Gerstein, M. B., Reymond, A., Hofacker, I. L., Stadler, P. F. 2007; 17 (6): 852-864

    Abstract

    Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).

    View details for DOI 10.1101/gr.5650707

    View details for Web of Science ID 000247226900017

    View details for PubMedID 17568003

  • Mapping of transcription factor binding regions in mammalian cells by ChIP: Comparison of array- and sequencing-based technologies GENOME RESEARCH Euskirchen, G. M., Rozowsky, J. S., Wei, C., Lee, W. H., Zhang, Z. D., Hartman, S., Emanuelsson, O., Stolc, V., Weissman, S., Gerstein, M. B., Ruan, Y., Snyder, M. 2007; 17 (6): 898-909

    Abstract

    Recent progress in mapping transcription factor (TF) binding regions can largely be credited to chromatin immunoprecipitation (ChIP) technologies. We compared strategies for mapping TF binding regions in mammalian cells using two different ChIP schemes: ChIP with DNA microarray analysis (ChIP-chip) and ChIP with DNA sequencing (ChIP-PET). We first investigated parameters central to obtaining robust ChIP-chip data sets by analyzing STAT1 targets in the ENCODE regions of the human genome, and then compared ChIP-chip to ChIP-PET. We devised methods for scoring and comparing results among various tiling arrays and examined parameters such as DNA microarray format, oligonucleotide length, hybridization conditions, and the use of competitor Cot-1 DNA. The best performance was achieved with high-density oligonucleotide arrays, oligonucleotides >/=50 bases (b), the presence of competitor Cot-1 DNA and hybridizations conducted in microfluidics stations. When target identification was evaluated as a function of array number, 80%-86% of targets were identified with three or more arrays. Comparison of ChIP-chip with ChIP-PET revealed strong agreement for the highest ranked targets with less overlap for the low ranked targets. With advantages and disadvantages unique to each approach, we found that ChIP-chip and ChIP-PET are frequently complementary in their relative abilities to detect STAT1 targets for the lower ranked targets; each method detected validated targets that were missed by the other method. The most comprehensive list of STAT1 binding regions is obtained by merging results from ChIP-chip and ChIP-sequencing. Overall, this study provides information for robust identification, scoring, and validation of TF targets using ChIP-based technologies.

    View details for DOI 10.1101/gr.5583007

    View details for Web of Science ID 000247226900021

    View details for PubMedID 17568005

  • Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome GENOME RESEARCH Trinklein, N. D., Karaoz, U., Wu, J., Halees, A., Aldred, S. F., Collins, P. J., Zheng, D., Zhang, Z. D., Gerstein, M. B., Snyder, M., Myers, R. M., Weng, Z. 2007; 17 (6): 720-731

    Abstract

    The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3'-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5'-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25% of them showed transcriptional activity above background in at least one cell line. We also performed 5'-RACE experiments on 62 novel regions, and 76% of the regions were associated with the 5'-ends of at least two RACE products. Our results suggest that there are at least 35% more functional promoters in the human genome than currently annotated.

    View details for DOI 10.1101/gr.5716607

    View details for Web of Science ID 000247226900006

    View details for PubMedID 17567992

    View details for PubMedCentralID PMC1891333

  • Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions GENOME RESEARCH Zhang, Z. D., Paccanaro, A., Fu, Y., Weissman, S., Weng, Z., Chang, J., Snyder, M., Gerstein, M. B. 2007; 17 (6): 787-797

    Abstract

    The comprehensive inventory of functional elements in 44 human genomic regions carried out by the ENCODE Project Consortium enables for the first time a global analysis of the genomic distribution of transcriptional regulatory elements. In this study we developed an intuitive and yet powerful approach to analyze the distribution of regulatory elements found in many different ChIP-chip experiments on a 10 approximately 100-kb scale. First, we focus on the overall chromosomal distribution of regulatory elements in the ENCODE regions and show that it is highly nonuniform. We demonstrate, in fact, that regulatory elements are associated with the location of known genes. Further examination on a local, single-gene scale shows an enrichment of regulatory elements near both transcription start and end sites. Our results indicate that overall these elements are clustered into regulatory rich "islands" and poor "deserts." Next, we examine how consistent the nonuniform distribution is between different transcription factors. We perform on all the factors a multivariate analysis in the framework of a biplot, which enhances biological signals in the experiments. This groups transcription factors into sequence-specific and sequence-nonspecific clusters. Moreover, with experimental variation carefully controlled, detailed correlations show that the distribution of sites was generally reproducible for a specific factor between different laboratories and microarray platforms. Data sets associated with histone modifications have particularly strong correlations. Finally, we show how the correlations between factors change when only regulatory elements far from the transcription start sites are considered.

    View details for DOI 10.1101/gr.5573107

    View details for Web of Science ID 000247226900011

    View details for PubMedID 17567997

  • The DART classification of unannotated transcription within the ENCODE regions: Associating transcription with known and novel loci GENOME RESEARCH Rozowsky, J. S., Newburger, D., Sayward, F., Wu, J., Jordan, G., Korbel, J. O., Nagalakshmi, U., Yang, J., Zheng, D., Guigo, R., Gingeras, T. R., Weissman, S., Miller, P., Snyder, M., Gerstein, M. B. 2007; 17 (6): 732-745

    Abstract

    For the approximately 1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of "unannotated transcription." We use a number of disparate features to classify the 6988 novel TARs-array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that approximately 14% of the novel TARs can be associated with known genes, while approximately 21% can be clustered into approximately 200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.

    View details for DOI 10.1101/gr.5696007

    View details for Web of Science ID 000247226900007

    View details for PubMedID 17567993

  • What is a gene, post-ENCODE? History and updated definition GENOME RESEARCH Gerstein, M. B., Bruce, C., Rozowsky, J. S., Zheng, D., Du, J., Korbel, J. O., Emanuelsson, O., Zhang, Z. D., Weissman, S., Snyder, M. 2007; 17 (6): 669-681

    Abstract

    While sequencing of the human genome surprised us with how many protein-coding genes there are, it did not fundamentally change our perspective on what a gene is. In contrast, the complex patterns of dispersed regulation and pervasive transcription uncovered by the ENCODE project, together with non-genic conservation and the abundance of noncoding RNA genes, have challenged the notion of the gene. To illustrate this, we review the evolution of operational definitions of a gene over the past century--from the abstract elements of heredity of Mendel and Morgan to the present-day ORFs enumerated in the sequence databanks. We then summarize the current ENCODE findings and provide a computational metaphor for the complexity. Finally, we propose a tentative update to the definition of a gene: A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products. Our definition side-steps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene. It also manifests how integral the concept of biological function is in defining genes.

    View details for DOI 10.1101/gr.6339607

    View details for Web of Science ID 000247226900002

    View details for PubMedID 17567988

  • Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution GENOME RESEARCH Zheng, D., Frankish, A., Baertsch, R., Kapranov, P., Reymond, A., Choo, S. W., Lu, Y., Denoeud, F., Antonarakis, S. E., Snyder, M., Ruan, Y., Wei, C., Gingeras, T. R., Guigo, R., Harrow, J., Gerstein, M. B. 2007; 17 (6): 839-851

    Abstract

    Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction ( approximately 80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

    View details for DOI 10.1101/gr.5586307

    View details for Web of Science ID 000247226900016

    View details for PubMedID 17568002

  • Getting connected: analysis and principles of biological networks GENES & DEVELOPMENT Zhu, X., Gerstein, M., Snyder, M. 2007; 21 (9): 1010-1024

    Abstract

    The execution of complex biological processes requires the precise interaction and regulation of thousands of molecules. Systematic approaches to study large numbers of proteins, metabolites, and their modification have revealed complex molecular networks. These biological networks are significantly different from random networks and often exhibit ubiquitous properties in terms of their structure and organization. Analyzing these networks provides novel insights in understanding basic mechanisms controlling normal cellular processes and disease pathologies.

    View details for DOI 10.1101/gad.1528707

    View details for Web of Science ID 000246154100002

    View details for PubMedID 17473168

  • Differential binding of calmodulin-related proteins to their targets revealed through high-density Arabidopsis protein microarrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Popescu, S. C., Popescu, G. V., Bachan, S., Zhang, Z., Seay, M., Gerstein, M., Snyder, M., Dinesh-Kumar, S. P. 2007; 104 (11): 4730-4735

    Abstract

    Calmodulins (CaMs) are the most ubiquitous calcium sensors in eukaryotes. A number of CaM-binding proteins have been identified through classical methods, and many proteins have been predicted to bind CaMs based on their structural homology with known targets. However, multicellular organisms typically contain many CaM-like (CML) proteins, and a global identification of their targets and specificity of interaction is lacking. In an effort to develop a platform for large-scale analysis of proteins in plants we have developed a protein microarray and used it to study the global analysis of CaM/CML interactions. An Arabidopsis thaliana expression collection containing 1,133 ORFs was generated and used to produce proteins with an optimized medium-throughput plant-based expression system. Protein microarrays were prepared and screened with several CaMs/CMLs. A large number of previously known and novel CaM/CML targets were identified, including transcription factors, receptor and intracellular protein kinases, F-box proteins, RNA-binding proteins, and proteins of unknown function. Multiple CaM/CML proteins bound many binding partners, but the majority of targets were specific to one or a few CaMs/CMLs indicating that different CaM family members function through different targets. Based on our analyses, the emergent CaM/CML interactome is more extensive than previously predicted. Our results suggest that calcium functions through distinct CaM/CML proteins to regulate a wide range of targets and cellular activities.

    View details for DOI 10.1073/pnas.0611615104

    View details for Web of Science ID 000244972700086

    View details for PubMedID 17360592

  • New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon mutagenesis GENES & DEVELOPMENT Smith, M. G., Gianoulis, T. A., Pukatzki, S., Mekalanos, J. J., Ornston, L. N., Gerstein, M., Snyder, M. 2007; 21 (5): 601-614

    Abstract

    Acinetobacter baumannii has emerged as an important and problematic human pathogen as it is the causative agent of several types of infections including pneumonia, meningitis, septicemia, and urinary tract infections. We explored the pathogenic content of this harmful pathogen using a combination of DNA sequencing and insertional mutagenesis. The genome of this organism was sequenced using a strategy involving high-density pyrosequencing, a novel, rapid method of high-throughput sequencing. Excluding the rDNA repeats, the assembled genome is 3,976,746 base pairs (bp) and has 3830 ORFs. A significant fraction of ORFs (17.2%) are located in 28 putative alien islands, indicating that the genome has acquired a large amount of foreign DNA. Consistent with its role in pathogenesis, a remarkable number of the islands (16) contain genes implicated in virulence, indicating the organism devotes a considerable portion of its genes to pathogenesis. The largest island contains elements homologous to the Legionella/Coxiella Type IV secretion apparatus. Type IV secretion systems have been demonstrated to be important for virulence in other organisms and thus are likely to help mediate pathogenesis of A. baumannii. Insertional mutagenesis generated avirulent isolates of A. baumannii and verified that six of the islands contain virulence genes, including two novel islands containing genes that lacked homology with others in the databases. The DNA sequencing approach described in this study allows the rapid elucidation of the DNA sequence of any microbe and, when combined with genetic screens, can identify many novel genes important for microbial pathogenesis.

    View details for DOI 10.1101/gad.1510307

    View details for Web of Science ID 000244760600011

    View details for PubMedID 17344419

  • Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool NUCLEIC ACIDS RESEARCH Yu, H., Nguyen, K., Royce, T., Qian, J., Nelson, K., Snyder, M., Gerstein, M. 2007; 35 (2)

    Abstract

    Microarray technology is currently one of the most widely-used technologies in biology. Many studies focus on inferring the function of an unknown gene from its co-expressed genes. Here, we are able to show that there are two types of positional artifacts in microarray data introducing spurious correlations between genes. First, we find that genes that are close on the microarray chips tend to have higher correlations between their expression profiles. We call this the 'chip artifact'. Our calculations suggest that the carry-over during the printing process is one of the major sources of this type of artifact, which is later confirmed by our experiments. Based on our experiments, the measured intensity of a microarray spot contains 0.1% (for fully-hybridized spots) to 93% (for un-hybridized ones) of noise resulting from this artifact. Secondly, we, for the first time, show that genes that are close on the microtiter plates in microarray experiments also tend to have higher correlations. We call this the 'plate artifact'. Both types of artifacts exist with different severity in all cDNA microarray experiments that we analyzed. Therefore, we develop an automated web tool-COP (COrrelations by Positional artifacts) to detect these artifacts in microarray experiments. COP has been integrated with the microarray data normalization tool, ExpressYourself, which is available at http://bioinfo.mbb.yale.edu/ExpressYourself/. Together, the two can eliminate most of the common noises in microarray data.

    View details for DOI 10.1093/nar/gkl871

    View details for Web of Science ID 000243993600001

    View details for PubMedID 17158151

  • Yeast protein microarrays YEAST GENE ANALYSIS, SECOND EDITION Ptacek, J., Snyder, M. 2007; 36: 303-?
  • Protein microarray technology 3rd International Conference on Functional Genomics of Ageing Hall, D. A., Ptacek, J., Snyder, M. ELSEVIER IRELAND LTD. 2007: 161–67

    Abstract

    Protein chips have emerged as a promising approach for a wide variety of applications including the identification of protein-protein interactions, protein-phospholipid interactions, small molecule targets, and substrates of proteins kinases. They can also be used for clinical diagnostics and monitoring disease states. This article reviews current methods in the generation and applications of protein microarrays.

    View details for DOI 10.1016/j.mad.2006.11.021

    View details for Web of Science ID 000244301700024

    View details for PubMedID 17126887

  • Tilescope: online analysis pipeline for high-density tiling microarray data GENOME BIOLOGY Zhang, Z. D., Rozowsky, J., Lam, H. Y., Du, J., Snyder, M., Gerstein, M. 2007; 8 (5)

    Abstract

    We developed Tilescope, a fully integrated data processing pipeline for analyzing high-density tiling-array data http://tilescope.gersteinlab.org. In a completely automated fashion, Tilescope will normalize signals between channels and across arrays, combine replicate experiments, score each array element, and identify genomic features. The program is designed with a modular, three-tiered architecture, facilitating parallelism, and a graphic user-friendly interface, presenting results in an organized web page, downloadable for further analysis.

    View details for DOI 10.1186/gb-2007-8-5-r81

    View details for Web of Science ID 000246983100034

    View details for PubMedID 17501994

  • A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge BIOINFORMATICS Du, J., Rozowsky, J. S., Korbel, J. O., Zhang, Z. D., Royce, T. E., Schultz, M. H., Snyder, M., Gerstein, M. 2006; 22 (24): 3016-3024

    Abstract

    Large-scale tiling array experiments are becoming increasingly common in genomics. In particular, the ENCODE project requires the consistent segmentation of many different tiling array datasets into 'active regions' (e.g. finding transfrags from transcriptional data and putative binding sites from ChIP-chip experiments). Previously, such segmentation was done in an unsupervised fashion mainly based on characteristics of the signal distribution in the tiling array data itself. Here we propose a supervised framework for doing this. It has the advantage of explicitly incorporating validated biological knowledge into the model and allowing for formal training and testing.In particular, we use a hidden Markov model (HMM) framework, which is capable of explicitly modeling the dependency between neighboring probes and whose extended version (the generalized HMM) also allows explicit description of state duration density. We introduce a formal definition of the tiling-array analysis problem, and explain how we can use this to describe sampling small genomic regions for experimental validation to build up a gold-standard set for training and testing. We then describe various ideal and practical sampling strategies (e.g. maximizing signal entropy within a selected region versus using gene annotation or known promoters as positives for transcription or ChIP-chip data, respectively).For the practical sampling and training strategies, we show how the size and noise in the validated training data affects the performance of an HMM applied to the ENCODE transcriptional and ChIP-chip experiments. In particular, we show that the HMM framework is able to efficiently process tiling array data as well as or better than previous approaches. For the idealized sampling strategies, we show how we can assess their performance in a simulation framework and how a maximum entropy approach, which samples sub-regions with very different signal intensities, gives the maximally performing gold-standard. This latter result has strong implications for the optimum way medium-scale validation experiments should be carried out to verify the results of the genome-scale tiling array experiments.

    View details for DOI 10.1093/bioinformatics/btl515

    View details for Web of Science ID 000242715200008

    View details for PubMedID 17038339

  • High-throughput methods of regulatory element discovery BIOTECHNIQUES Hudson, M. E., Snyder, M. 2006; 41 (6): 673-?

    Abstract

    With the number of organisms whose genomes have been sequenced, a vast amount of information concerning the genetic structure of an organism's genome has been collected. However, effective experiment means to study how this information is accessed have only recently been developed. In this review, three basic methods for identifying regions of protein-DNA interaction will be introduced. The first two, chromatin immunoprecipitation (ChIP)-chip and ChIP-PET (for paired-end ditag), rely on the enrichment provided by chromosomal immunoprecipitation to interrogate the genomic sequence for the interaction sites of a protein of interest. In contrast, protein microarrays allow the identification of DNA binding protein that interacts with a DNA sequence of interest. These complementary methods of exploring protein-DNA interactions will increase our fundamental knowledge of how the information contained within the genome sequence is accessed and processed.

    View details for Web of Science ID 000242737100019

    View details for PubMedID 17191608

  • HTRA1 promoter polymorphism in wet age-related macular degeneration SCIENCE DeWan, A., Liu, M., Hartman, S., Zhang, S. S., Liu, D. T., Zhao, C., Tam, P. O., Chan, W. M., Lam, D. S., Snyder, M., Barnstable, C., Pang, C. P., Hoh, J. 2006; 314 (5801): 989-992

    Abstract

    Age-related macular degeneration (AMD), the most common cause of irreversible vision loss in individuals aged older than 50 years, is classified as either wet (neovascular) or dry (nonneovascular). Inherited variation in the complement factor H gene is a major risk factor for drusen in dry AMD. Here we report that a single-nucleotide polymorphism in the promoter region of HTRA1, a serine protease gene on chromosome 10q26, is a major genetic risk factor for wet AMD. A whole-genome association mapping strategy was applied to a Chinese population, yielding a P value of <10(-11). Individuals with the risk-associated genotype were estimated to have a likelihood of developing wet AMD 10 times that of individuals with the wild-type genotype.

    View details for DOI 10.1126/science.1133807

    View details for Web of Science ID 000241896000052

    View details for PubMedID 17053108

  • Charging it up: global analysis of protein phosphorylation TRENDS IN GENETICS Ptacek, J., Snyder, M. 2006; 22 (10): 545-554

    Abstract

    Protein phosphorylation affects most, if not all, cellular activities in eukaryotes and is essential for cell proliferation and development. An estimated 30% of cellular proteins are phosphorylated, representing the phosphoproteome, and phosphorylation can alter a protein's function, activity, localization and stability. Recent studies for large-scale identification of phosphosites using mass spectrometry are revealing the components of the phosphoproteome. The development of new tools, such as kinase assays using modified kinases or protein microarrays, enables rapid kinase substrate identification. The dynamics of specific phosphorylation events can now be monitored using mass spectrometry, single-cell analysis of flow cytometry, or fluorescent reporters. Together, these techniques are beginning to elucidate cellular processes and pathways regulated by phosphorylation, in addition to global regulatory networks.

    View details for DOI 10.1016/j.tig.2006.08.005

    View details for Web of Science ID 000241268400006

    View details for PubMedID 16908088

  • TOS9 regulates white-opaque switching in Candida albicans EUKARYOTIC CELL Srikantha, T., Borneman, A. R., Daniels, K. J., Pujol, C., Wu, W., Seringhaus, M. R., Gerstein, M., Yi, S., Snyder, M., Soll, D. R. 2006; 5 (10): 1674-1687

    Abstract

    In Candida albicans, the a1-alpha2 complex represses white-opaque switching, as well as mating. Based upon the assumption that the a1-alpha2 corepressor complex binds to the gene that regulates white-opaque switching, a chromatinimmunoprecipitation-microarray analysis strategy was used to identify 52 genes that bound to the complex. One of these genes, TOS9, exhibited an expression pattern consistent with a "master switch gene." TOS9 was only expressed in opaque cells, and its gene product, Tos9p, localized to the nucleus. Deletion of the gene blocked cells in the white phase, misexpression in the white phase caused stable mass conversion of cells to the opaque state, and misexpression blocked temperature-induced mass conversion from the opaque state to the white state. A model was developed for the regulation of spontaneous switching between the opaque state and the white state that includes stochastic changes of Tos9p levels above and below a threshold that induce changes in the chromatin state of an as-yet-unidentified switching locus. TOS9 has also been referred to as EAP2 and WOR1.

    View details for DOI 10.1128/EC.00252-06

    View details for Web of Science ID 000241344300010

    View details for PubMedID 16950924

  • Predicting essential genes in fungal genomes GENOME RESEARCH Seringhaus, M., Paccanaro, A., Borneman, A., Snyder, M., Gerstein, M. 2006; 16 (9): 1126-1135

    Abstract

    Essential genes are required for an organism's viability, and the ability to identify these genes in pathogens is crucial to directed drug development. Predicting essential genes through computational methods is appealing because it circumvents expensive and difficult experimental screens. Most such prediction is based on homology mapping to experimentally verified essential genes in model organisms. We present here a different approach, one that relies exclusively on sequence features of a gene to estimate essentiality and offers a promising way to identify essential genes in unstudied or uncultured organisms. We identified 14 characteristic sequence features potentially associated with essentiality, such as localization signals, codon adaptation, GC content, and overall hydrophobicity. Using the well-characterized baker's yeast Saccharomyces cerevisiae, we employed a simple Bayesian framework to measure the correlation of each of these features with essentiality. We then employed the 14 features to learn the parameters of a machine learning classifier capable of predicting essential genes. We trained our classifier on known essential genes in S. cerevisiae and applied it to the closely related and relatively unstudied yeast Saccharomyces mikatae. We assessed predictive success in two ways: First, we compared all of our predictions with those generated by homology mapping between these two species. Second, we verified a subset of our predictions with eight in vivo knockouts in S. mikatae, and we present here the first experimentally confirmed essential genes in this species.

    View details for DOI 10.1101/gr.5144106

    View details for Web of Science ID 000240238600007

    View details for PubMedID 16899653

  • Proteome chips for whole-organism assays NATURE REVIEWS MOLECULAR CELL BIOLOGY Kung, L. A., Snyder, M. 2006; 7 (8): 617-622

    Abstract

    Over the past 5 years, protein-chip technology has emerged as a useful tool for the study of many kinds of protein interactions and biochemical activities. The construction of Saccharomyces cerevisiae whole-proteome arrays has enabled further studies of such interactions in a proteome-wide context. Here, we explore some of the recent advances that have been made at the '-omic' level using protein microarrays.

    View details for DOI 10.1038/nrm1941

    View details for Web of Science ID 000239240000019

    View details for PubMedID 16723973

  • Linking DNA-binding proteins to their recognition sequences by using protein microarrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Ho, S., Jona, G., Chen, C. T., Johnston, M., Snyder, M. 2006; 103 (26): 9940-9945

    Abstract

    Analyses of whole-genome sequences and experimental data sets have revealed a large number of DNA sequence motifs that are conserved in many species and may be functional. However, methods of sufficient scale to explore the roles of these elements are lacking. We describe the use of protein arrays to identify proteins that bind to DNA sequences of interest. A microarray of 282 known and potential yeast transcription factors was produced and probed with oligonucleotides of evolutionarily conserved sequences that are potentially functional. Transcription factors that bound to specific DNA sequences were identified. One previously uncharacterized DNA-binding protein, Yjl103, was characterized in detail. We defined the binding site for this protein and identified a number of its target genes, many of which are involved in stress response and oxidative phosphorylation. Protein microarrays offer a high-throughput method for determining DNA-protein interactions.

    View details for DOI 10.1073/pnas.0509185103

    View details for Web of Science ID 000238872900036

    View details for PubMedID 16785442

  • Defined culture conditions of human embryonic stem cells PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Lu, J., Hou, R. H., Booth, C. J., Yang, S. H., Snyder, M. 2006; 103 (15): 5688-5693

    Abstract

    Human embryonic stem cells (hESCs) are pluripotent cells that have the potential to differentiate into any tissue in the human body; therefore, they are a valuable resource for regenerative medicine, drug screening, and developmental studies. However, the clinical application of hESCs is hampered by the difficulties of eliminating animal products in the culture medium and/or the complexity of conditions required to support hESC growth. We have developed a simple medium [termed hESC Cocktail (HESCO)] containing basic fibroblast growth factor, Wnt3a, April (a proliferation-inducing ligand)/BAFF (B cell-activating factor belonging to TNF), albumin, cholesterol, insulin, and transferrin, which is sufficient for hESC self-renewal and proliferation. Cells grown in HESCO were maintained in an undifferentiated state as determined by using six different stem cell markers, and their genomic integrity was confirmed by karyotyping. Cells cultured in HESCO readily form embryoid bodies in tissue culture and teratomas in mice. In both cases, the cells differentiated into each of the three cell lineages, ectoderm, endoderm, and mesoderm, indicating that they maintained their pluripotency. The use of a minimal medium sufficient for hESC growth is expected to greatly facilitate clinical application and developmental studies of hESCs.

    View details for DOI 10.1073/pnas.0601383103

    View details for Web of Science ID 000236896200012

    View details for PubMedID 16595624

  • High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Urban, A. E., Korbel, J. O., Selzer, R., Richmond, T., Hacker, A., Popescu, G. V., Clubells, J. F., Green, R., Emanuel, B. S., Gerstein, M. B., Weissman, S. M., Snyder, M. 2006; 103 (12): 4534-4539

    Abstract

    Deletions and amplifications of the human genomic sequence (copy number polymorphisms) are the cause of numerous diseases and a potential cause of phenotypic variation in the normal population. Comparative genomic hybridization (CGH) has been developed as a useful tool for detecting alterations in DNA copy number that involve blocks of DNA several kilobases or larger in size. We have developed high-resolution CGH (HR-CGH) to detect accurately and with relatively little bias the presence and extent of chromosomal aberrations in human DNA. Maskless array synthesis was used to construct arrays containing 385,000 oligonucleotides with isothermal probes of 45-85 bp in length; arrays tiling the beta-globin locus and chromosome 22q were prepared. Arrays with a 9-bp tiling path were used to map a 622-bp heterozygous deletion in the beta-globin locus. Arrays with an 85-bp tiling path were used to analyze DNA from patients with copy number changes in the pericentromeric region of chromosome 22q. Heterozygous deletions and duplications as well as partial triploidies and partial tetraploidies of portions of chromosome 22q were mapped with high resolution (typically up to 200 bp) in each patient, and the precise breakpoints of two deletions were confirmed by DNA sequencing. Additional peaks potentially corresponding to known and novel additional CNPs were also observed. Our results demonstrate that HR-CGH allows the detection of copy number changes in the human genome at an unprecedented level of resolution.

    View details for DOI 10.1073/pnas.0511340103

    View details for Web of Science ID 000236362600039

    View details for PubMedID 16537408

  • Severe acute respiratory syndrome diagnostics using a coronavirus protein microarray PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zhu, H., Hu, S. H., Jona, G., Zhu, X. W., Kreiswirth, N., Willey, B. M., Mazzulli, T., Liu, G. Z., Song, Q. F., Chen, P., Cameron, M., Tyler, A., Wang, J., Wen, J., Chen, W. J., Compton, S., Snyder, M. 2006; 103 (11): 4011-4016

    Abstract

    To monitor severe acute respiratory syndrome (SARS) infection, a coronavirus protein microarray that harbors proteins from SARS coronavirus (SARS-CoV) and five additional coronaviruses was constructed. These microarrays were used to screen approximately 400 Canadian sera from the SARS outbreak, including samples from confirmed SARS-CoV cases, respiratory illness patients, and healthcare professionals. A computer algorithm that uses multiple classifiers to predict samples from SARS patients was developed and used to predict 206 sera from Chinese fever patients. The test assigned patients into two distinct groups: those with antibodies to SARS-CoV and those without. The microarray also identified patients with sera reactive against other coronavirus proteins. Our results correlated well with an indirect immunofluorescence test and demonstrated that viral infection can be monitored for many months after infection. We show that protein microarrays can serve as a rapid, sensitive, and simple tool for large-scale identification of viral-specific antibodies in sera.

    View details for Web of Science ID 000236429300016

    View details for PubMedID 16537477

  • Target hub proteins serve as master regulators of development in yeast GENES & DEVELOPMENT Borneman, A. R., Leigh-Bell, J. A., Yu, H. Y., Bertone, P., Gerstein, M., Snyder, M. 2006; 20 (4): 435-448

    Abstract

    To understand the organization of the transcriptional networks that govern cell differentiation, we have investigated the transcriptional circuitry controlling pseudohyphal development in Saccharomyces cerevisiae. The binding targets of Ste12, Tec1, Sok2, Phd1, Mga1, and Flo8 were globally mapped across the yeast genome. The factors and their targets form a complex binding network, containing patterns characteristic of autoregulation, feedback and feed-forward loops, and cross-talk. Combinatorial binding to intergenic regions was commonly observed, which allowed for the identification of a novel binding association between Mga1 and Flo8, in which Mga1 requires Flo8 for binding to promoter regions. Further analysis of the network showed that the promoters of MGA1 and PHD1 were bound by all of the factors used in this study, identifying them as key target hubs. Overexpression of either of these two proteins specifically induced pseudohyphal growth under noninducing conditions, highlighting them as master regulators of the system. Our results indicate that target hubs can serve as master regulators whose activity is sufficient for the induction of complex developmental responses and therefore represent important regulatory nodes in biological networks.

    View details for DOI 10.1101/gad.1389306

    View details for Web of Science ID 000235428600007

    View details for PubMedID 16449570

  • Mapping pathways and phenotypes by systematic gene overexpression MOLECULAR CELL Sopko, R., Huang, D. Q., Preston, N., Chua, G., Papp, B., Kafadar, K., Snyder, M., Oliver, S. G., Cyert, M., Hughes, T. R., Boone, C., Andrews, B. 2006; 21 (3): 319-330

    Abstract

    Many disease states result from gene overexpression, often in a specific genetic context. To explore gene overexpression phenotypes systematically, we assembled an array of 5280 yeast strains, each containing an inducible copy of an S. cerevisiae gene, covering >80% of the genome. Approximately 15% of the overexpressed genes (769) reduced growth rate. This gene set was enriched for cell cycle-regulated genes, signaling molecules, and transcription factors. Overexpression of most toxic genes resulted in phenotypes different from known deletion mutant phenotypes, suggesting that overexpression phenotypes usually reflect a specific regulatory imbalance rather than disruption of protein complex stoichiometry. Global overexpression effects were also assayed in the context of a cyclin-dependent kinase mutant (pho85Delta). The resultant gene set was enriched for Pho85p targets and identified the yeast calcineurin-responsive transcription factor Crz1p as a substrate. Large-scale application of this approach should provide a strategy for identifying target molecules regulated by specific signaling pathways.

    View details for DOI 10.1016/j.molcel.2005.12.011

    View details for Web of Science ID 000235436100003

    View details for PubMedID 16455487

  • Yeast as a model for human disease. Current protocols in human genetics / editorial board, Jonathan L. Haines ... [et al.] Smith, M. G., Snyder, M. 2006; Chapter 15: Unit 15 6-?

    Abstract

    The sequencing of the human genome promised the identification of disease-causing genes and, subsequently, therapies for those diseases. However, when identifying the genetic basis of a disease, it is not uncommon to discover an abnormal protein whose normal function is unknown. The genetic manipulations required to assign function to genes is often extremely difficult, if not impossible, in human cells. Model organisms have been used to facilitate understanding of gene function because of the ease of genetic manipulations and because many features of eukaryotic physiology have been conserved across phyla. Yeast is a simple eukaryote with a tractable genome, a short generation time, and a large network of researchers who have generated a vast arsenal of research tools. These traits make yeast ideally suited to help reveal the function of genes implicated in human disease.

    View details for DOI 10.1002/0471142905.hg1506s48

    View details for PubMedID 18428391

  • Design optimization methods for genomic DNA tiling arrays GENOME RESEARCH Bertone, P., Trifonov, V., Rozowsky, J. S., Schubert, F., Emanuelsson, O., Karro, J., Kao, M. Y., Snyder, M., Gerstein, M. 2006; 16 (2): 271-281

    Abstract

    A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central issue in designing tiling arrays is that of arriving at a single-copy tile path, as significant sequence cross-hybridization can result from the presence of non-unique probes on the array. Due to the fragmentation of genomic DNA caused by the widespread distribution of repetitive elements, the problem of obtaining adequate sequence coverage increases with the sizes of subsequence tiles that are to be included in the design. This becomes increasingly problematic when considering complex eukaryotic genomes that contain many thousands of interspersed repeats. The general problem of sequence tiling can be framed as finding an optimal partitioning of non-repetitive subsequences over a prescribed range of tile sizes, on a DNA sequence comprising repetitive and non-repetitive regions. Exact solutions to the tiling problem become computationally infeasible when applied to large genomes, but successive optimizations are developed that allow their practical implementation. These include an efficient method for determining the degree of similarity of many oligonucleotide sequences over large genomes, and two algorithms for finding an optimal tile path composed of longer sequence tiles. The first algorithm, a dynamic programming approach, finds an optimal tiling in linear time and space; the second applies a heuristic search to reduce the space complexity to a constant requirement. A Web resource has also been developed, accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences.

    View details for DOI 10.1101/gr.4455906

    View details for Web of Science ID 000235122000015

    View details for PubMedID 16365382

  • ProCAT: a data analysis approach for protein microarrays GENOME BIOLOGY Zhu, X., Gerstein, M., Snyder, M. 2006; 7 (11)

    Abstract

    Protein microarrays provide a versatile method for the analysis of many protein biochemical activities. Existing DNA microarray analytical methods do not translate to protein microarrays due to differences between the technologies. Here we report a new approach, ProCAT, which corrects for background bias and spatial artifacts, identifies significant signals, filters nonspecific spots, and normalizes the resulting signal to protein abundance. ProCAT provides a powerful and flexible new approach for analyzing many types of protein microarrays.

    View details for DOI 10.1186/gb-2006-7-11-r110

    View details for Web of Science ID 000243967000014

    View details for PubMedID 17109749

  • BoCaTFBS: a boosted cascade learner to refine the binding sites suggested by ChIP-chip experiments GENOME BIOLOGY Wang, L., Snyder, M., Gerstein, M. 2006; 7 (11)

    Abstract

    Comprehensive mapping of transcription factor binding sites is essential in postgenomic biology. For this, we propose a mining approach combining noisy data from ChIP (chromatin immunoprecipitation)-chip experiments with known binding site patterns. Our method (BoCaTFBS) uses boosted cascades of classifiers for optimum efficiency, in which components are alternating decision trees; it exploits interpositional correlations; and it explicitly integrates massive negative information from ChIP-chip experiments. We applied BoCaTFBS within the ENCODE project and showed that it outperforms many traditional binding site identification methods (for instance, profiles).

    View details for DOI 10.1186/gb-2006-7-11-r102

    View details for Web of Science ID 000243967000006

    View details for PubMedID 17078876

  • Extrapolating traditional DNA microarray statistics to tiling and protein microarray technologies DNA MICROARRAYS, PART B: DATABASES AND STATISTICS Royce, T. E., Rozowsky, J. S., Luscombe, N. M., Emanuelsson, O., Yu, H., Zhu, X., Snyder, M., Gerstein, M. B. 2006; 411: 282-311

    Abstract

    A credit to microarray technology is its broad application. Two experiments--the tiling microarray experiment and the protein microarray experiment--are exemplars of the versatility of the microarrays. With the technology's expanding list of uses, the corresponding bioinformatics must evolve in step. There currently exists a rich literature developing statistical techniques for analyzing traditional gene-centric DNA microarrays, so the first challenge in analyzing the advanced technologies is to identify which of the existing statistical protocols are relevant and where and when revised methods are needed. A second challenge is making these often very technical ideas accessible to the broader microarray community. The aim of this chapter is to present some of the most widely used statistical techniques for normalizing and scoring traditional microarray data and indicate their potential utility for analyzing the newer protein and tiling microarray experiments. In so doing, we will assume little or no prior training in statistics of the reader. Areas covered include background correction, intensity normalization, spatial normalization, and the testing of statistical significance.

    View details for DOI 10.1016/S0076-6879(06)11015-0

    View details for Web of Science ID 000244506300015

    View details for PubMedID 16939796

  • Genomic analysis of insertion behavior and target specificity of mini-Tn7 and Tn3 transposons in Saccharomyces cerevisiae NUCLEIC ACIDS RESEARCH Seringhaus, M., Kumar, A., Hartigan, J., Snyder, M., Gerstein, M. 2006; 34 (8)

    Abstract

    Transposons are widely employed as tools for gene disruption. Ideally, they should display unbiased insertion behavior, and incorporate readily into any genomic DNA to which they are exposed. However, many transposons preferentially insert at specific nucleotide sequences. It is unclear to what extent such bias affects their usefulness as mutagenesis tools. Here, we examine insertion site specificity and global insertion behavior of two mini-transposons previously used for large-scale gene disruption in Saccharomyces cerevisiae: Tn3 and Tn7. Using an expanded set of insertion data, we confirm that Tn3 displays marked preference for the AT-rich 5 bp consensus site TA[A/T]TA, whereas Tn7 displays negligible target site preference. On a genome level, both transposons display marked non-uniform insertion behavior: certain sites are targeted far more often than expected, and both distributions depart drastically from Poisson. Thus, to compare their insertion behavior on a genome level, we developed a windowed Kolmogorov-Smirnov (K-S) test to analyze transposon insertion distributions in sequence windows of various sizes. We find that when scored in large windows (>300 bp), both Tn3 and Tn7 distributions appear uniform, whereas in smaller windows, Tn7 appears uniform while Tn3 does not. Thus, both transposons are effective tools for gene disruption, but Tn7 does so with less duplication and a more uniform distribution, better approximating the behavior of the ideal transposon.

    View details for DOI 10.1093/nar/gkl184

    View details for Web of Science ID 000237697000001

    View details for PubMedID 16648358

  • Novel transcribed regions in the human genome 71st Cold Spring Harbor Symposium on Quantitative Biology Rozowsky, J., Wu, J., Lian, Z., Nagalakshmi, U., Korbel, J. O., Kapranov, P., Zheng, D., Dyke, S., Newburger, P., Miller, P., Gingeras, T. R., Weissman, S., Gerstein, M., Snyder, M. COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT. 2006: 111–116

    Abstract

    We have used genomic tiling arrays to identify transcribed regions throughout the human genome. Analysis of the mapping results of RNA isolated from five cell/tissue types, NB4 cells, NB4 cells treated with retinoic acid (RA), NB4 cells treated with 12-O-tetradecanoylphorbol-13 acetate (TPA), neutrophils, and placenta, throughout the ENCODE region reveals a large number of novel transcribed regions. Interestingly, neutrophils exhibit a great deal of novel expression in several intronic regions. Comparison of the hybridization results of NB4 cells treated with different stimuli relative to untreated cells reveals that many new regions are expressed upon cell differentiation. One such region is the Hox locus, which contains a large number of novel regions expressed in a number of cell types. Analysis of the trinucleotide composition of the novel transcribed regions reveals that it is similar to that of known exons. These results suggest that many of the novel transcribed regions may have a functional role.

    View details for Web of Science ID 000245962800015

    View details for PubMedID 17381286

  • Global changes in STAT target selection and transcription regulation upon interferon treatments GENES & DEVELOPMENT Hartman, S. E., Bertone, P., Nath, A. K., Royce, T. E., Gerstein, M., Weissman, S., Snyder, M. 2005; 19 (24): 2953-2968

    Abstract

    The STAT (signal transducer and activator of transcription) proteins play a crucial role in the regulation of gene expression, but their targets and the manner in which they select them remain largely unknown. Using chromatin immunoprecipitation and DNA microarray analysis (ChIP-chip), we have identified the regions of human chromosome 22 bound by STAT1 and STAT2 in interferon-treated cells. Analysis of the genomic loci proximal to these binding sites introduced new candidate STAT1 and STAT2 target genes, several of which are affiliated with proliferation and apoptosis. The genes on chromosome 22 that exhibited interferon-induced up- or down-regulated expression were determined and correlated with the STAT-binding site information, revealing the potential regulatory effects of STAT1 and STAT2 on their target genes. Importantly, the comparison of STAT1-binding sites upon interferon (IFN)-gamma and IFN-alpha treatments revealed dramatic changes in binding locations between the two treatments. The IFN-alpha induction revealed nonconserved STAT1 occupancy at IFN-gamma-induced sites, as well as novel sites of STAT1 binding not evident in IFN-gamma-treated cells. Many of these correlated with binding by STAT2, but others were STAT2 independent, suggesting that multiple mechanisms direct STAT1 binding to its targets under different activation conditions. Overall, our results reveal a wealth of new information regarding IFN/STAT-binding targets and also fundamental insights into mechanisms of regulation of gene expression in different cell states.

    View details for DOI 10.1101/gad.1371305

    View details for Web of Science ID 000234095500004

    View details for PubMedID 16319195

  • Global analysis of protein phosphorylation in yeast NATURE Ptacek, J., Devgan, G., Michaud, G., Zhu, H., Zhu, X. W., Fasolo, J., Guo, H., Jona, G., Breitkreutz, A., Sopko, R., McCartney, R. R., Schmidt, M. C., Rachidi, N., Lee, S. J., Mah, A. S., Meng, L., Stark, M. J., Stern, D. F., De Virgilio, C., Tyers, M., Andrews, B., Gerstein, M., Schweitzer, B., Predki, P. F., Snyder, M. 2005; 438 (7068): 679-684

    Abstract

    Protein phosphorylation is estimated to affect 30% of the proteome and is a major regulatory mechanism that controls many basic cellular processes. Until recently, our biochemical understanding of protein phosphorylation on a global scale has been extremely limited; only one half of the yeast kinases have known in vivo substrates and the phosphorylating kinase is known for less than 160 phosphoproteins. Here we describe, with the use of proteome chip technology, the in vitro substrates recognized by most yeast protein kinases: we identified over 4,000 phosphorylation events involving 1,325 different proteins. These substrates represent a broad spectrum of different biochemical functions and cellular roles. Distinct sets of substrates were recognized by each protein kinase, including closely related kinases of the protein kinase A family and four cyclin-dependent kinases that vary only in their cyclin subunits. Although many substrates reside in the same cellular compartment or belong to the same functional category as their phosphorylating kinase, many others do not, indicating possible new roles for several kinases. Furthermore, integration of the phosphorylation results with protein-protein interaction and transcription factor binding data revealed novel regulatory modules. Our phosphorylation results have been assembled into a first-generation phosphorylation map for yeast. Because many yeast proteins and pathways are conserved, these results will provide insights into the mechanisms and roles of protein phosphorylation in many eukaryotes.

    View details for DOI 10.1038/nature04187

    View details for Web of Science ID 000233593100053

    View details for PubMedID 16319894

  • Biochemical and genetic analysis of the yeast proteome with a movable ORF collection GENES & DEVELOPMENT Gelperin, D. M., White, M. A., Wilkinson, M. L., Kon, Y., Kung, L. A., Wise, K. J., Lopez-Hoyo, N., Jiang, L. X., Piccirillo, S., Yu, H. Y., Gerstein, M., Dumont, M. E., Phizicky, E. M., Snyder, M., Grayhack, E. J. 2005; 19 (23): 2816-2826

    Abstract

    Functional analysis of the proteome is an essential part of genomic research. To facilitate different proteomic approaches, a MORF (moveable ORF) library of 5854 yeast expression plasmids was constructed, each expressing a sequence-verified ORF as a C-terminal ORF fusion protein, under regulated control. Analysis of 5573 MORFs demonstrates that nearly all verified ORFs are expressed, suggests the authenticity of 48 ORFs characterized as dubious, and implicates specific processes including cytoskeletal organization and transcriptional control in growth inhibition caused by overexpression. Global analysis of glycosylated proteins identifies 109 new confirmed N-linked and 345 candidate glycoproteins, nearly doubling the known yeast glycome.

    View details for Web of Science ID 000233765900003

    View details for PubMedID 16322557

  • Advances in functional protein microarray technology 15th Biennial Conference on Methods in Protein Structure Analysis Bertone, P., Snyder, M. WILEY-BLACKWELL. 2005: 5400–5411

    Abstract

    Numerous innovations in high-throughput protein production and microarray surface technologies have enabled the development of addressable formats for proteins ordered at high spatial density. Protein array implementations have largely focused on antibody arrays for high-throughput protein profiling. However, it is also possible to construct arrays of full-length, functional proteins from a library of expression clones. The advent of protein-based microarrays allows the global observation of biochemical activities on an unprecedented scale, where hundreds or thousands of proteins can be simultaneously screened for protein-protein, protein-nucleic acid, and small molecule interactions. This technology holds great potential for basic molecular biology research, disease marker identification, toxicological response profiling and pharmaceutical target screening.

    View details for DOI 10.1111/j.1742-4658.2005.04970.x

    View details for Web of Science ID 000232772200003

    View details for PubMedID 16262682

  • Checkpoint maintenance requires Ame1 and Okp1 CELL CYCLE Pot, I., Knockleby, J., Aneliunas, V., Nguyen, T., Ah-Kye, S., Liszt, G., Snyder, M., Hieter, P., Vogel, J. 2005; 4 (10): 1448-1456

    Abstract

    Kinetochore proteins are required for high fidelity chromosome segregation and as a platform for checkpoint signaling. Ame1 is an essential component of the COMA (Ctf19, Okp1, Mcm21, Ame1) sub-complex of the central kinetochore of budding yeast. In this study, we describe the isolation and characterization of an Ame1 conditional mutant, ame1-4. ame1-4 cells exhibit chromosome segregation defects and Mad2-dependent cell cycle delay similar to okp1-5 cells. However, the viability of ame1-4 cells is markedly reduced relative to wild type and okp1-5 cells after three hours at restrictive temperature. To determine if ame1-4 cells enter anaphase with mis-segregated chromosomes, we monitored the localization of Bub3:VFP as a marker for anaphase onset. ame1-4 cells containing mis-segregated sister chromatids initially accumulate Bub3:VFP at kinetochores, indicating checkpoint activation and a metaphase arrest. Subsequently, Bub3:VFP de-localizes and cells reinitiate DNA duplication and budding without cytokinesis in the presence of un-segregated chromosomes. Overexpression of OKP1 in ame1-4 cells restores ame1-4 protein localization and a stable arrest. Based on our results, we propose that Ame1 and Okp1 are required for a sustained checkpoint arrest in the presence of mis-segregated chromosomes. Our results suggest that checkpoint response might be controlled not only at the level of activation but also via signals that ensure maintenance of the response.

    View details for Web of Science ID 000233751500030

    View details for PubMedID 16177574

  • A pilot study of transcription unit analysis in rice using oligonucleotide tiling-path microarray PLANT MOLECULAR BIOLOGY Stolc, V., Li, L., Wang, X. F., Li, X. Y., Su, N., Tongprasit, W., Han, B., Xue, Y. B., Li, J. Y., Snyder, M., Gerstein, M., Wang, J., Deng, X. W. 2005; 59 (1): 137-149

    Abstract

    As the international efforts to sequence the rice genome are completed, an immediate challenge and opportunity is to comprehensively and accurately define all transcription units in the rice genome. Here we describe a strategy of using high-density oligonucleotide tiling-path microarrays to map transcription of the japonica rice genome. In a pilot experiment to test this approach, one array representing the reverse strand of the last 11.2 Mb sequence of chromosome 10 was analyzed in detail based on a mathematical model developed in this study. Analysis of the array data detected 77% of the reference gene models in a mixture of four RNA populations. Moreover, significant transcriptional activities were found in many of the previously annotated intergenic regions. These preliminary results demonstrate the utility of genome tiling microarrays in evaluating annotated rice gene models and in identifying novel transcription units that will facilitate rice genome annotation.

    View details for DOI 10.1007/s11103-005-6164-5

    View details for Web of Science ID 000232498000012

    View details for PubMedID 16217608

  • Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping TRENDS IN GENETICS Royce, T. E., Rozowsky, J. S., Bertone, P., Samanta, M., Stolc, V., Weissman, S., Snyder, M., Gerstein, M. 2005; 21 (8): 466-475

    Abstract

    Traditional microarrays use probes complementary to known genes to quantitate the differential gene expression between two or more conditions. Genomic tiling microarray experiments differ in that probes that span a genomic region at regular intervals are used to detect the presence or absence of transcription. This difference means the same sets of biases and the methods for addressing them are unlikely to be relevant to both types of experiment. We introduce the informatics challenges arising in the analysis of tiling microarray experiments as open problems to the scientific community and present initial approaches for the analysis of this nascent technology.

    View details for DOI 10.1016/j.tig.2005.06.007

    View details for Web of Science ID 000231209200010

    View details for PubMedID 15979196

  • Prospects and challenges in proteomics PLANT PHYSIOLOGY Bertone, P., Snyder, M. 2005; 138 (2): 560-562

    View details for DOI 10.1104/pp.104.900154

    View details for Web of Science ID 000229774200009

    View details for PubMedID 15955915

  • Sexual dimorphism in mammalian gene expression TRENDS IN GENETICS Rinn, J. L., Snyder, M. 2005; 21 (5): 298-305

    Abstract

    Males and females have obvious phenotypic differences; they also exhibit differences related to health, life span, cognitive abilities and have different responses to diseases such as anemia, coronary heart disease, hypertension and renal dysfunction. Although the anatomical, hormonal and chemical differences between the sexes are well known, there are few molecular descriptors for gender-specific physiological traits and health risks. Recent studies using microarrays and other methods have made significant progress towards elucidating the molecular differences between mammalian sexes in a variety of tissues and towards identifying the transcription factors that regulate sex-biased gene expression. These findings are providing new insights into the molecular and genetic differences that dictate the different behaviors and physiologies of mammalian sexes.

    View details for DOI 10.1016/j.tig.2005.03.005

    View details for Web of Science ID 000229143800012

    View details for PubMedID 15851067

  • Applications of DNA tiling arrays to experimental genome annotation and regulatory pathway discovery CHROMOSOME RESEARCH Bertone, P., Gerstein, M., Snyder, M. 2005; 13 (3): 259-274

    Abstract

    Microarrays have become a popular and important technology for surveying global patterns in gene expression and regulation. A number of innovative experiments have extended microarray applications beyond the measurement of mRNA expression levels, in order to uncover aspects of large-scale chromosome function and dynamics. This has been made possible due to the recent development of tiling arrays, where all non-repetitive DNA comprising a chromosome or locus is represented at various sequence resolutions. Since tiling arrays are designed to contain the entire DNA sequence without prior consultation of existing gene annotation, they enable the discovery of novel transcribed sequences and regulatory elements through the unbiased interrogation of genomic loci. The implementation of such methods for the global analysis of large eukaryotic genomes presents significant technical challenges. Nonetheless, tiling arrays are expected to become instrumental for the genome-wide identification and characterization of functional elements. Combined with computational methods to relate these data and map the complex interactions of transcriptional regulators, tiling array experiments can provide insight toward a more comprehensive understanding of fundamental molecular and cellular processes.

    View details for DOI 10.1007/s10577-005-2165-0

    View details for Web of Science ID 000228868500005

    View details for PubMedID 15868420

  • Substrate specificity analysis of protein kinase complex Dbf2-Mob1 by peptide library and proteome array screening. BMC biochemistry Mah, A. S., Elia, A. E., Devgan, G., Ptacek, J., Schutkowski, M., Snyder, M., Yaffe, M. B., Deshaies, R. J. 2005; 6: 22-?

    Abstract

    The mitotic exit network (MEN) is a group of proteins that form a signaling cascade that is essential for cells to exit mitosis in Saccharomyces cerevisiae. The MEN has also been implicated in playing a role in cytokinesis. Two components of this signaling pathway are the protein kinase Dbf2 and its binding partner essential for its kinase activity, Mob1. The components of MEN that act upstream of Dbf2-Mob1 have been characterized, but physiological substrates for Dbf2-Mob1 have yet to be identified.Using a combination of peptide library selection, phosphorylation of optimal peptide variants, and screening of a phosphosite array, we found that Dbf2-Mob1 preferentially phosphorylated serine over threonine and required an arginine three residues upstream of the phosphorylated serine in its substrate. This requirement for arginine in peptide substrates could not be substituted with the similarly charged lysine. This specificity determined for peptide substrates was also evident in many of the proteins phosphorylated by Dbf2-Mob1 in a proteome chip analysis.We have determined by peptide library selection and phosphosite array screening that the protein kinase Dbf2-Mob1 preferentially phosphorylated substrates that contain an RXXS motif. A subsequent proteome microarray screen revealed proteins that can be phosphorylated by Dbf2-Mob1 in vitro. These proteins are enriched for RXXS motifs, and may include substrates that mediate the function of Dbf2-Mob1 in mitotic exit and cytokinesis. The relatively low degree of sequence restriction at the site of phosphorylation suggests that Dbf2 achieves specificity by docking its substrates at a site that is distinct from the phosphorylation site.

    View details for PubMedID 16242037

  • Global analysis of protein function using protein microarrays 2nd International Conference on Functional Genomics of Ageing Smith, M. G., Jona, G., Ptacek, J., Devgan, G., Zhu, H., Zhu, X. W., Snyder, M. ELSEVIER IRELAND LTD. 2005: 171–75

    Abstract

    Protein microarrays containing thousands of proteins arrayed at high density can be prepared and probed for a wide variety of activities, thereby allowing the large scale analysis of many proteins simultaneously. In addition to identifying the activities of many previously uncharacterized proteins, protein microarrays can reveal new activities of well-characterized proteins, thus providing new insights about the functions of these proteins. Below, we describe the construction and use of protein microarrays and their applications using yeast as a model system.

    View details for DOI 10.1016/j.mad.2004.09.019

    View details for Web of Science ID 000226564000022

    View details for PubMedID 15610776

  • Global identification of human transcribed sequences with genome tiling arrays SCIENCE Bertone, P., Stolc, V., Royce, T. E., Rozowsky, J. S., Urban, A. E., Zhu, X. W., Rinn, J. L., Tongprasit, W., Samanta, M., Weissman, S., Gerstein, M., Snyder, M. 2004; 306 (5705): 2242-2246

    Abstract

    Elucidating the transcribed regions of the genome constitutes a fundamental aspect of human biology, yet this remains an outstanding problem. To comprehensively identify coding sequences, we constructed a series of high-density oligonucleotide tiling arrays representing sense and antisense strands of the entire nonrepetitive sequence of the human genome. Transcribed sequences were located across the genome via hybridization to complementary DNA samples, reverse-transcribed from polyadenylated RNA obtained from human liver tissue. In addition to identifying many known and predicted genes, we found 10,595 transcribed sequences not detected by other methods. A large fraction of these are located in intergenic regions distal from previously annotated genes and exhibit significant homology to other mammalian proteins.

    View details for DOI 10.1126/science.1103388

    View details for Web of Science ID 000225950000042

    View details for PubMedID 15539566

  • DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA White, E. J., Emanuelsson, O., Scalzo, D., Royce, T., Kosak, S., Oakeley, E. J., Weissman, S., Gerstein, M., Groudine, M., Snyder, M., Schubeler, D. 2004; 101 (51): 17771-17776

    Abstract

    Duplication of the genome during the S phase of the cell cycle does not occur simultaneously; rather, different sequences are replicated at different times. The replication timing of specific sequences can change during development; however, the determinants of this dynamic process are poorly understood. To gain insights into the contribution of developmental state, genomic sequence, and transcriptional activity to replication timing, we investigated the timing of DNA replication at high resolution along an entire human chromosome (chromosome 22) in two different cell types. The pattern of replication timing was correlated with respect to annotated genes, gene expression, novel transcribed regions of unknown function, sequence composition, and cytological features. We observed that chromosome 22 contains regions of early- and late-replicating domains of 100 kb to 2 Mb, many (but not all) of which are associated with previously described chromosomal bands. In both cell types, expressed sequences are replicated earlier than nontranscribed regions. However, several highly transcribed regions replicate late. Overall, the DNA replication-timing profiles of the two different cell types are remarkably similar, with only nine regions of difference observed. In one case, this difference reflects the differential expression of an annotated gene that resides in this region. Novel transcribed regions with low coding potential exhibit a strong propensity for early DNA replication. Although the cellular function of such transcripts is poorly understood, our results suggest that their activity is linked to the replication-timing program.

    View details for DOI 10.1073/pnas.0408170101

    View details for Web of Science ID 000225951500038

    View details for PubMedID 15591350

  • Regulation of gene expression by a metabolic enzyme SCIENCE Hall, D. A., Zhu, H., Zhu, X. W., Royce, T., Gerstein, M., Snyder, M. 2004; 306 (5695): 482-484

    Abstract

    Gene expression in eukaryotes is normally believed to be controlled by transcriptional regulators that activate genes encoding structural proteins and enzymes. To identify previously unrecognized DNA binding activities, a yeast proteome microarray was screened with DNA probes; Arg5,6, a well-characterized mitochondrial enzyme involved in arginine biosynthesis, was identified. Chromatin immunoprecipitation experiments revealed that Arg5,6 is associated with specific nuclear and mitochondrial loci in vivo, and Arg5,6 binds to specific fragments in vitro. Deletion of Arg5,6 causes altered transcript levels of both nuclear and mitochondrial target genes. These results indicate that metabolic enzymes can directly regulate eukaryotic gene expression.

    View details for Web of Science ID 000224626500052

    View details for PubMedID 15486299

  • Large-scale mutagenesis of the yeast genome using a Tn7-derived multipurpose transposon GENOME RESEARCH Kumar, A., Seringhaus, M., Biery, M. C., Sarnovsky, R. J., Umansky, L., Piccirillo, S., Heidtman, M., Cheung, K. H., Dobry, C. J., Gerstein, M. B., Craig, N. L., Snyder, M. 2004; 14 (10A): 1975-1986

    Abstract

    We present here an unbiased and extremely versatile insertional library of yeast genomic DNA generated by in vitro mutagenesis with a multipurpose element derived from the bacterial transposon Tn7. This mini-Tn7 element has been engineered such that a single insertion can be used to generate a lacZ fusion, gene disruption, and epitope-tagged gene product. Using this transposon, we generated a plasmid-based library of approximately 300,000 mutant alleles; by high-throughput screening in yeast, we identified and sequenced 9032 insertions affecting 2613 genes (45% of the genome). From analysis of 7176 insertions, we found little bias in Tn7 target-site selection in vitro. In contrast, we also sequenced 10,174 Tn3 insertions and found a markedly stronger preference for an AT-rich 5-base pair target sequence. We further screened 1327 insertion alleles in yeast for hypersensitivity to the chemotherapeutic cisplatin. Fifty-one genes were identified, including four functionally uncharacterized genes and 25 genes involved in DNA repair, replication, transcription, and chromatin structure. In total, the collection reported here constitutes the largest plasmid-based set of sequenced yeast mutant alleles to date and, as such, should be singularly useful for gene and genome-wide functional analysis.

    View details for DOI 10.1101/gr.2875304

    View details for Web of Science ID 000224405900017

    View details for PubMedID 15466296

  • Genomic analysis of regulatory network dynamics reveals large topological changes NATURE Luscombe, N. M., Babu, M. M., Yu, H. Y., Snyder, M., Teichmann, S. A., Gerstein, M. 2004; 431 (7006): 308-312

    Abstract

    Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here--particularly the large-scale topological changes and hub transience--will apply to other biological networks, including complex sub-systems in higher eukaryotes.

    View details for DOI 10.1038/nature02782

    View details for Web of Science ID 000223864000041

    View details for PubMedID 15372033

  • Major molecular differences between mammalian sexes are involved in drug metabolism and renal function DEVELOPMENTAL CELL Rinn, J. L., Rozowsky, J. S., Laurenzi, I. J., Petersen, P. H., Zou, K. Y., Zhong, W. M., Gerstein, M., Snyder, M. 2004; 6 (6): 791-800

    Abstract

    Many anatomical differences exist between males and females; these are manifested on a molecular level by different hormonal environments. Although several molecular differences in adult tissues have been identified, a comprehensive investigation of the gene expression differences between males and females has not been performed. We surveyed the expression patterns of 13,977 mouse genes in male and female hypothalamus, kidney, liver, and reproductive tissues. Extensive differential gene expression was observed not only in the reproductive tissues, but also in the kidney and liver. The differentially expressed genes are involved in drug and steroid metabolism, osmotic regulation, or as yet unresolved cellular roles. In contrast, very few molecular differences were observed between the male and female hypothalamus in both mice and humans. We conclude that there are persistent differences in gene expression between adult males and females. These molecular differences have important implications for the physiological differences between males and females.

    View details for Web of Science ID 000222443200012

    View details for PubMedID 15177028

  • CREB binds to multiple loci on human chromosome 22 MOLECULAR AND CELLULAR BIOLOGY Euskirchen, G., Royce, T. E., Bertone, P., Martone, R., Rinn, J. L., Nelson, F. K., Sayward, F., Luscombe, N. M., Miller, P., Gerstein, M., Weissman, S., Snyder, M. 2004; 24 (9): 3804-3814

    Abstract

    The cyclic AMP-responsive element-binding protein (CREB) is an important transcription factor that can be activated by hormonal stimulation and regulates neuronal function and development. An unbiased, global analysis of where CREB binds has not been performed. We have mapped for the first time the binding distribution of CREB along an entire human chromosome. Chromatin immunoprecipitation of CREB-associated DNA and subsequent hybridization of the associated DNA to a genomic DNA microarray containing all of the nonrepetitive DNA of human chromosome 22 revealed 215 binding sites corresponding to 192 different loci and 100 annotated potential gene targets. We found binding near or within many genes involved in signal transduction and neuronal function. We also found that only a small fraction of CREB binding sites lay near well-defined 5' ends of genes; the majority of sites were found elsewhere, including introns and unannotated regions. Several of the latter lay near novel unannotated transcriptionally active regions. Few CREB targets were found near full-length cyclic AMP response element sites; the majority contained shorter versions or close matches to this sequence. Several of the CREB targets were altered in their expression by treatment with forskolin; interestingly, both induced and repressed genes were found. Our results provide novel molecular insights into how CREB mediates its functions in humans.

    View details for DOI 10.1128/MCB.24.9.3804-3814.2004

    View details for Web of Science ID 000220898100021

    View details for PubMedID 15082775

  • Microbial synergy via an ethanol-triggered pathway MOLECULAR AND CELLULAR BIOLOGY Smith, M. G., Des Etages, S. G., Snyder, M. 2004; 24 (9): 3874-3884

    Abstract

    We have discovered a microbial interaction between yeast, bacteria, and nematodes. Upon coculturing, Saccharomyces cerevisiae stimulated the growth of several species of Acinetobacter, including, A. baumannii, A. haemolyticus, A. johnsonii, and A. radioresistens, as well as several natural isolates of Acinetobacter. This enhanced growth was due to a diffusible factor that was shown to be ethanol by chemical assays and evaluation of strains lacking ADH1, ADH3, and ADH5, as all three genes are involved in ethanol production by yeast. This effect is specific to ethanol: methanol, butanol, and dimethyl sulfoxide were unable to stimulate growth to any appreciable level. Low doses of ethanol not only stimulated growth to a higher cell density but also served as a signaling molecule: in the presence of ethanol, Acinetobacter species were able to withstand the toxic effects of salt, indicating that ethanol alters cell physiology. Furthermore, ethanol-fed A. baumannii displayed increased pathogenicity when confronted with a predator, Caenorhabditis elegans. Our results are consistent with the concept that ethanol can serve as a signaling molecule which can affect bacterial physiology and survival.

    View details for DOI 10.1128/MCB.24.9.3874-3884.2004

    View details for Web of Science ID 000220898100027

    View details for PubMedID 15082781

  • Regulation of polarized growth initiation and termination cycles by the polarisome and Cdc42 regulators JOURNAL OF CELL BIOLOGY Bidlingmaier, S., Snyder, M. 2004; 164 (2): 207-218

    Abstract

    The dynamic regulation of polarized cell growth allows cells to form structures of defined size and shape. We have studied the regulation of polarized growth using mating yeast as a model. Haploid yeast cells treated with high concentration of pheromone form successive mating projections that initiate and terminate growth with regular periodicity. The mechanisms that control the frequency of growth initiation and termination under these conditions are not well understood. We found that the polarisome components Spa2, Pea2, and Bni1 and the Cdc42 regulators Cdc24 and Bem3 control the timing and frequency of projection formation. Loss of polarisome components and mutation of Cdc24 decrease the frequency of projection formation, while loss of Bem3 increases the frequency of projection formation. We found that polarisome components and the cell fusion proteins Fus1 and Fus2 are important for the termination of projection growth. Our results define the first molecular regulators that control the timing of growth initiation and termination during eukaryotic cell differentiation.

    View details for DOI 10.1083/jcb.200307065

    View details for Web of Science ID 000188370500006

    View details for PubMedID 14734532

  • Fast optimal genome tiling with applications to microarray design and homology search JOURNAL OF COMPUTATIONAL BIOLOGY Berman, P., Bertone, P., DasGupta, B., Gerstein, M., Kao, M. Y., Snyder, M. 2004; 11 (4): 766-785

    Abstract

    In this paper, we consider several variations of the following basic tiling problem: given a sequence of real numbers with two size-bound parameters, we want to find a set of tiles of maximum total weight such that each tiles satisfies the size bounds. A solution to this problem is important to a number of computational biology applications such as selecting genomic DNA fragments for PCR-based amplicon microarrays and performing homology searches with long sequence queries. Our goal is to design efficient algorithms with linear or near-linear time and space in the normal range of parameter values for these problems. For this purpose, we first discuss the solution to a basic online interval maximum problem via a sliding-window approach and show how to use this solution in a nontrivial manner for many of the tiling problems introduced. We also discuss NP-hardness results and approximation algorithms for generalizing our basic tiling problem to higher dimensions. Finally, computational results from applying our tiling algorithms to genomic sequences of five model eukaryotes are reported.

    View details for Web of Science ID 000223974700015

    View details for PubMedID 15579244

  • Analyzing antibody specificity with whole proteome microarrays NATURE BIOTECHNOLOGY Michaud, G. A., Salcius, M., Zhou, F., Bangham, R., Bonin, J., Guo, H., Snyder, M., Predki, P. F., Schweitzer, B. I. 2003; 21 (12): 1509-1512

    Abstract

    Although approximately 10,000 antibodies are available from commercial sources, antibody reagents are still unavailable for most proteins. Furthermore, new applications such as antibody arrays and monoclonal antibody therapeutics have increased the demand for more specific antibodies to reduce cross-reactivity and side effects. An array containing every protein for the relevant organism represents the ideal format for an assay to test antibody specificity, because it allows the simultaneous screening of thousands of proteins for possible cross-reactivity. As an initial test of this approach, we screened 11 polyclonal and monoclonal antibodies to approximately 5,000 different yeast proteins deposited on a glass slide and found that, in addition to recognizing their cognate proteins, the antibodies cross-reacted with other yeast proteins to varying degrees. Some of the interactions of the antibodies with noncognate proteins could be deduced by alignment of the primary amino acid sequences of the antigens and cross-reactive proteins; however, these interactions could not be predicted a priori. Our findings show that proteome array technology has potential to improve antibody design and selection for applications in both medicine and research.

    View details for DOI 10.1038/nbt910

    View details for Web of Science ID 000186845200031

    View details for PubMedID 14608365

  • Changes in the nutrient content of school lunches: results from the Pathways study PREVENTIVE MEDICINE Story, M., Snyder, M. P., Anliker, J., Weber, J. L., Cunningham-Sabo, L., Stone, E. J., Chamberlain, A., Ethelbah, B., Suchindran, C., Ring, K. 2003; 37 (6): S35-S45

    Abstract

    Pathways, a randomized trial, evaluated the effectiveness of a school-based multicomponent intervention to reduce fatness in American-Indian schoolchildren. The goal of the Pathways food service intervention component was to reduce the fat in school lunches to no more than 30% of energy from fat while maintaining recommended levels of calories and key nutrients.The intervention was implemented by school food service staff in intervention schools over a 3-year period. Five consecutive days of school lunch menu items were collected from 20 control and 21 intervention schools at four time periods, and nutrient content was analyzed.There was a significantly greater mean reduction in percent energy from fat and saturated fat in the intervention schools compared to the control schools. Mean percentages of energy from fat decreased from 33.1% at baseline to 28.3% at the end of the study in intervention schools compared to 33.2% at baseline and 32.2% at follow-up in the control schools (P<0.003). There were no statistically significant differences for calories or nutrients between intervention and control schools.The Pathways school food lunch intervention documented the feasibility of successfully lowering the percent of energy from fat, as part of a coordinated obesity prevention program for American-Indian children.

    View details for DOI 10.1016/j.ypmed.2003.08.009

    View details for Web of Science ID 000187114300005

    View details for PubMedID 14636807

  • Negative regulation of calcineurin signaling by Hrr25p, a yeast homolog of casein kinase I GENES & DEVELOPMENT Kafadar, K. A., Zhu, H., Snyder, M., Cyert, M. S. 2003; 17 (21): 2698-2708

    Abstract

    Calcineurin is a Ca2+/calmodulin-regulated protein phosphatase required for Saccharomyces cerevisiae to respond to a variety of environmental stresses. Calcineurin promotes cell survival during stress by dephosphorylating and activating the Zn-finger transcription factor Crz1p/Tcn1p. Using a high-throughput assay, we screened 119 yeast kinases for their ability to phosphorylate Crz1p in vitro and identified the casein kinase I homolog Hrr25p. Here we show that Hrr25p negatively regulates Crz1p activity and nuclear localization in vivo. Hrr25p binds to and phosphorylates Crz1p in vitro and in vivo. Overexpression of Hrr25p decreases Crz1p-dependent transcription and antagonizes its Ca2+-induced nuclear accumulation. In the absence of Hrr25p, activation of Crz1p by Ca2+/calcineurin is potentiated. These findings represent the first identification of a negative regulator for Crz1p, and establish a novel physiological role for Hrr25p in antagonizing calcineurin signaling.

    View details for DOI 10.1101/gad.1140603

    View details for Web of Science ID 000186299700011

    View details for PubMedID 14597664

    View details for PubMedCentralID PMC280619

  • Microarrays to characterize protein interactions on a whole-proteome scale PROTEOMICS Schweitzer, B., Predki, P., Snyder, M. 2003; 3 (11): 2190-2199

    Abstract

    Protein microarrays contain a defined set of proteins spotted and analyzed at high density, and can be generally classified into two categories; protein profiling arrays and functional protein arrays. Functional protein arrays can be made up of any type of protein, and therefore have a diverse set of useful applications. Advantages of these arrays include low reagent consumption, rapid interpretation of results, and the ability to easily control experimental conditions. The ultimate form of a functional protein array consists of all of the proteins encoded by the genome of an organism; such an array would be the whole proteome equivalent of the whole genome DNA arrays that are now available. While proteome microarrays may not have reached the stage of maturity of DNA microarrays, recent developments have shown that many of the barriers holding back the technology can be overcome. Arrays of this type have already been used to rapidly screen large numbers of proteins simultaneously for biochemical activities, protein-protein interactions, protein-lipid interactions, protein-nucleic acid interactions, and protein-small molecule interactions. Eventually, functional protein arrays will be used to facilitate various steps in the drug discovery and early development processes that are currently bottlenecks in the drug development continuum.

    View details for DOI 10.1002/pmic.200300610

    View details for Web of Science ID 000186582500015

    View details for PubMedID 14595818

  • A Bayesian networks approach for predicting protein-protein interactions from genomic data SCIENCE Jansen, R., Yu, H. Y., Greenbaum, D., Kluger, Y., Krogan, N. J., Chung, S. B., Emili, A., Snyder, M., Greenblatt, J. F., Gerstein, M. 2003; 302 (5644): 449-453

    Abstract

    We have developed an approach using Bayesian networks to predict protein-protein interactions genome-wide in yeast. Our method naturally weights and combines into reliable predictions genomic features only weakly associated with interaction (e.g., messenger RNAcoexpression, coessentiality, and colocalization). In addition to de novo predictions, it can integrate often noisy, experimental interaction data sets. We observe that at given levels of sensitivity, our predictions are more accurate than the existing high-throughput experimental data sets. We validate our predictions with TAP (tandem affinity purification) tagging experiments. Our analysis, which gives a comprehensive view of yeast interactions, is available at genecensus.org/intint.

    View details for Web of Science ID 000185963200044

    View details for PubMedID 14564010

  • Distribution of NF-kappa B-binding sites across human chromosome 22 PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T. E., Luscombe, N. M., Rinn, J. L., Nelson, F. K., Miller, P., Gerstein, M., Weissman, S., Snyder, M. 2003; 100 (21): 12247-12252

    Abstract

    We have mapped the chromosomal binding site distribution of a transcription factor in human cells. The NF-kappaB family of transcription factors plays an essential role in regulating the induction of genes involved in several physiological processes, including apoptosis, immunity, and inflammation. The binding sites of the NF-kappaB family member p65 were determined by using chromatin immunoprecipitation and a genomic microarray of human chromosome 22 DNA. Sites of binding were observed along the entire chromosome in both coding and noncoding regions, with an enrichment at the 5' end of genes. Strikingly, a significant proportion of binding was seen in intronic regions, demonstrating that transcription factor binding is not restricted to promoter regions. NF-kappaB binding was also found at genes whose expression was regulated by tumor necrosis factor alpha, a known inducer of NF-kappaB-dependent gene expression, as well as adjacent to genes whose expression is not affected by tumor necrosis factor alpha. Many of these latter genes are either known to be activated by NF-kappaB under other conditions or are consistent with NF-kappaB's role in the immune and apoptotic responses. Our results suggest that binding is not restricted to promoter regions and that NF-kappaB binding occurs at a significant number of genes whose expression is not altered, thereby suggesting that binding alone is not sufficient for gene activation.

    View details for DOI 10.1073/pnas.2135255100

    View details for Web of Science ID 000186024300058

    View details for PubMedID 14527995

  • Cytoskeletal activation of a checkpoint kinase MOLECULAR CELL Hanrahan, J., Snyder, M. 2003; 12 (3): 663-673

    Abstract

    The assembly of cytoskeletal structures is coupled to other cellular processes. We have studied the molecular mechanism by which assembly of the yeast septin cytoskeleton is monitored and coordinated with cell cycle progression by analyzing a key regulatory protein kinase, Hsl1, that becomes activated only when the septin cytoskeleton is properly assembled. We first identified a regulatory region of Hsl1 that physically associates with the kinase domain and found that it performs an autoinhibitory function both in vivo and in vitro. Several septin binding domains lie near and overlap the inhibitory domain; these are important for Hsl1 function, and binding of two septins, Cdc11 and Cdc12, relieves the autoinhibition imposed by the kinase inhibitory domain in vitro. Our results suggest that binding to multiple septins activates Hsl1 kinase activity, thereby promoting cell cycle progression. The high conservation of Hsl1 indicates that similar mechanisms may monitor cytoskeletal organization in other eukaryotes.

    View details for Web of Science ID 000185613800015

    View details for PubMedID 14527412

  • Specific protein targeting during cell differentiation: Polarized localization of Fus1p during mating depends on Chs5p in Saccharomyces cerevisiae EUKARYOTIC CELL Santos, B., Snyder, M. 2003; 2 (4): 821-825

    Abstract

    In budding yeast, chs5 mutants are defective in chitin synthesis and cell fusion during mating. Chs5p is a late-Golgi protein required for the polarized transport of the chitin synthase Chs3p to the membrane. Here we show that Chs5p is also essential for the polarized targeting of Fus1p, but not of other cell fusion proteins, to the membrane during mating.

    View details for DOI 10.1128/EC.2.4.821-825.2003

    View details for Web of Science ID 000184803000018

    View details for PubMedID 12912901

  • ExpressYourself: a modular platform for processing and visualizing microarray data NUCLEIC ACIDS RESEARCH Luscombe, N. M., Royce, T. E., Bertone, P., Echols, N., Horak, C. E., Chang, J. T., Snyder, M., Gerstein, M. 2003; 31 (13): 3477-3482

    Abstract

    DNA microarrays are widely used in biological research; by analyzing differential hybridization on a single microarray slide, one can detect changes in mRNA expression levels, increases in DNA copy numbers and the location of transcription factor binding sites on a genomic scale. Having performed the experiments, the major challenge is to process large, noisy datasets in order to identify the specific array elements that are significantly differentially hybridized. This normally requires aggregating different, often incompatible programs into a multi-step pipeline. Here we present ExpressYourself, a fully integrated platform for processing microarray data. In completely automated fashion, it will correct the background array signal, normalize the Cy5 and Cy3 signals, score levels of differential hybridization, combine the results of replicate experiments, filter problematic regions of the array and assess the quality of individual and replicate experiments. ExpressYourself is designed with a highly modular architecture so various types of microarray analysis algorithms can readily be incorporated as they are developed; for example, the system currently implements several normalization methods, including those that simultaneously consider signal intensity and slide location. The processed data are presented using a web-based graphical interface to facilitate comparison with the original images of the array slides. In particular, Express Yourself is able to regenerate images of the original microarray after applying various steps of processing, which greatly facilities identification of position-specific artifacts. The program is freely available for use at http://bioinfo.mbb.yale.edu/expressyourself.

    View details for DOI 10.1093/nar/gkg628

    View details for Web of Science ID 000183832900040

    View details for PubMedID 12824348

  • Recent developments in analytical and functional protein microarrays CURRENT OPINION IN MOLECULAR THERAPEUTICS Jona, G., Snyder, M. 2003; 5 (3): 271-277

    Abstract

    In recent years, the genomes of many different organisms have been fully sequenced and annotated. As a consequence of this information, a number of methods have emerged to study the function of many genes and proteins in parallel. One recent approach for the large-scale analysis of proteins is the use of protein microarrays in which hundreds to thousands of proteins are arrayed and assayed simultaneously. Protein arrays can be used for assessing protein levels and following disease markers, identifying biochemical activities, analyzing post-translational modifications, building interaction networks, and for drug discovery and development. In this review, we discuss the construction of different types of protein arrays, and their numerous and diverse applications.

    View details for Web of Science ID 000184024600009

    View details for PubMedID 12870437

  • Genomics - Defining genes in the genomics era SCIENCE Snyder, M., Gerstein, M. 2003; 300 (5617): 258-260

    View details for Web of Science ID 000182135400032

    View details for PubMedID 12690176

  • Molecular dissection of a yeast septin: Distinct domains are required for septin interaction, localization, and function MOLECULAR AND CELLULAR BIOLOGY Casamayor, A., Snyder, M. 2003; 23 (8): 2762-2777

    Abstract

    The septins are a family of cytoskeletal proteins present in animal and fungal cells. They were first identified for their essential role in cytokinesis, but more recently, they have been found to play an important role in many cellular processes, including bud site selection, chitin deposition, cell compartmentalization, and exocytosis. Septin proteins self-associate into filamentous structures that, in yeast cells, form a cortical ring at the mother bud neck. Members of the septin family share common structural domains: a GTPase domain in the central region of the protein, a stretch of basic residues at the amino terminus, and a predicted coiled-coil domain at the carboxy terminus. We have studied the role of each domain in the Saccharomyces cerevisiae septin Cdc11 and found that the three domains are responsible for distinct and sometimes overlapping functions. All three domains are important for proper localization and function in cytokinesis and morphogenesis. The basic region was found to bind the phosphoinositides phosphatidylinositol 4-phosphate and phosphatidylinositol 5-phosphate. The coiled-coil domain is important for interaction with Cdc3 and Bem4. The GTPase domain is involved in Cdc11-septin interaction and targeting to the mother bud neck. Surprisingly, GTP binding appears to be dispensable for Cdc11 function, localization, and lipid binding. Thus, we find that septins are multifunctional proteins with specific domains involved in distinct molecular interactions required for assembly, localization, and function within the cell.

    View details for DOI 10.1128/MCB.23.8.2762-2777.2003

    View details for Web of Science ID 000182049900012

    View details for PubMedID 12665577

  • Protein analysis on a proteomic scale NATURE Phizicky, E., Bastiaens, P. I., Zhu, H., Snyder, M., Fields, S. 2003; 422 (6928): 208-215

    Abstract

    The long-term challenge of proteomics is enormous: to define the identities, quantities, structures and functions of complete complements of proteins, and to characterize how these properties vary in different cellular contexts. One critical step in tackling this goal is the generation of sets of clones that express a representative of each protein of a proteome in a useful format, followed by the analysis of these sets on a genome-wide basis. Such studies enable genetic, biochemical and cell biological technologies to be applied on a systematic level, leading to the assignment of biochemical activities, the construction of protein arrays, the identification of interactions, and the localization of proteins within cellular compartments.

    View details for DOI 10.1038/nature01512

    View details for Web of Science ID 000181488900055

    View details for PubMedID 12634794

  • The transcriptional activity of human Chromosome 22 GENES & DEVELOPMENT Rinn, J. L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N. M., Hartman, S., Harrison, P. M., Nelson, F. K., Miller, P., Gerstein, M., Weissman, S., Snyder, M. 2003; 17 (4): 529-540

    Abstract

    A DNA microarray representing nearly all of the unique sequences of human Chromosome 22 was constructed and used to measure global-transcriptional activity in placental poly(A)(+) RNA. We found that many of the known, related and predicted genes are expressed. More importantly, our study reveals twice as many transcribed bases as have been reported previously. Many of the newly discovered expressed fragments were verified by RNA blot analysis and a novel technique called differential hybridization mapping (DHM). Interestingly, a significant fraction of these novel fragments are expressed antisense to previously annotated introns. The coding potential of these novel expressed regions is supported by their sequence conservation in the mouse genome. This study has greatly increased our understanding of the biological information encoded on a human chromosome. To facilitate the dissemination of these results to the scientific community, we have developed a comprehensive Web resource to present the findings of this study and other features of human Chromosome 22 at http://array.mbb.yale.edu/chr22.

    View details for DOI 10.1101/gad.1055203

    View details for Web of Science ID 000181276200011

    View details for PubMedID 12600945

  • Protein chip technology CURRENT OPINION IN CHEMICAL BIOLOGY Zhu, H., Snyder, M. 2003; 7 (1): 55-63

    Abstract

    Microarray technology has become a crucial tool for large-scale and high-throughput biology. It allows fast, easy and parallel detection of thousands of addressable elements in a single experiment. In the past few years, protein microarray technology has shown its great potential in basic research, diagnostics and drug discovery. It has been applied to analyse antibody-antigen, protein-protein, protein-nucleic-acid, protein-lipid and protein-small-molecule interactions, as well as enzyme-substrate interactions. Recent progress in the field of protein chips includes surface chemistry, capture molecule attachment, protein labeling and detection methods, high-throughput protein/antibody production, and applications to analyse entire proteomes.

    View details for DOI 10.1016/S1367-5931(02)00005-4

    View details for Web of Science ID 000180868900009

    View details for PubMedID 12547427

  • Identification of novel functional elements in the human genome 67th Cold Spring Harbor Symposium on Quantitative Biology Lian, Z., Euskirchen, G., Rinn, J., Martone, R., Bertone, P., Hartman, S., Royce, T., Nelson, K., Sayward, F., Luscombe, N., Yang, J., Li, J. L., Miller, P., Urban, A. E., Gerstein, M., Weissman, S., Snyder, M. COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT. 2003: 317–322

    View details for Web of Science ID 000222969300037

    View details for PubMedID 15338632

  • Proteomics ANNUAL REVIEW OF BIOCHEMISTRY Zhu, H., Bilgin, M., Snyder, M. 2003; 72: 783-812

    Abstract

    Fueled by ever-growing DNA sequence information, proteomics-the large scale analysis of proteins-has become one of the most important disciplines for characterizing gene function, for building functional linkages between protein molecules, and for providing insight into the mechanisms of biological processes in a high-throughput mode. It is now possible to examine the expression of more than 1000 proteins using mass spectrometry technology coupled with various separation methods. High-throughput yeast two-hybrid approaches and analysis of protein complexes using affinity tag purification have yielded valuable protein-protein interaction maps. Large-scale protein tagging and subcellular localization projects have provided considerable information about protein function. Finally, recent developments in protein microarray technology provide a versatile tool to study protein-protein, protein-nucleic acid, protein-lipid, enzyme-substrate, and protein-drug interactions. Other types of microarrays, though not fully developed, also show great potential in diagnostics, protein profiling, and drug identification and validation. This review discusses high-throughput technologies for proteome analysis and their applications. Also discussed are the approaches used for the integrated analysis of the voluminous sets of data generated by proteome analysis conducted on a global scale.

    View details for DOI 10.1146/annurev.biochem.72.121801.161511

    View details for Web of Science ID 000185092500024

    View details for PubMedID 14527327

  • Proteomic approaches for the global analysis of proteins BIOTECHNIQUES Michaud, G. A., Snyder, M. 2002; 33 (6): 1308-1316

    Abstract

    Improvements in technology that allow miniaturization and high-throughput analyses of thousand of genes and gene products have changed the focus and scope of research and development in both academia and industry. It is now possible to study entire proteomes with the goals of elucidating protein expression, subcellular localization, biochemical activities, and their regulation. Alterations in different cell types and conditions and in normal and disease states can be revealed. This wealth of information not only has facilitated our basic understanding of many biological processes but also has enormous potential for drug discovery and development.

    View details for Web of Science ID 000179996500022

    View details for PubMedID 12503317

  • Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae GENES & DEVELOPMENT Horak, C. E., Luscombe, N. M., Qian, J. A., Bertone, P., Piccirrillo, S., Gerstein, M., Snyder, M. 2002; 16 (23): 3017-3033

    Abstract

    In the yeast Saccharomyces cerevisiae, SBF (Swi4-Swi6 cell cycle box binding factor) and MBF (MluI binding factor) are the major transcription factors regulating the START of the cell cycle, a time just before DNA replication, bud growth initiation, and spindle pole body (SPB) duplication. These two factors bind to the promoters of 235 genes, but bind less than a quarter of the promoters upstream of genes with peak transcript levels at the G1 phase of the cell cycle. Several functional categories, which are known to be crucial for G1/S events, such as SPB duplication/migration and DNA synthesis, are under-represented in the list of SBF and MBF gene targets. SBF binds the promoters of several other transcription factors, including HCM1, PLM2, POG1, TOS4, TOS8, TYE7, YAP5, YHP1, and YOX1. Here, we demonstrate that these factors are targets of SBF using an independent assay. To further elucidate the transcriptional circuitry that regulates the G1-to-S-phase progression, these factors were epitope-tagged and their binding targets were identified by chIp-chip analysis. These factors bind the promoters of genes with roles in G1/S events including DNA replication, bud growth, and spindle pole complex formation, as well as the general activities of mitochondrial function, transcription, and protein synthesis. Although functional overlap exists between these factors and MBF and SBF, each of these factors has distinct functional roles. Most of these factors bind the promoters of other transcription factors known to be cell cycle regulated or known to be important for cell cycle progression and differentiation processes indicating that a complex network of transcription factors coordinates the diverse activities that initiate a new cell cycle.

    View details for DOI 10.1101/gad.1039602

    View details for Web of Science ID 000179649300005

    View details for PubMedID 12464632

  • The alpha-factor receptor C-terminus is important for mating projection formation and orientation in Saccharomyces cerevisiae CELL MOTILITY AND THE CYTOSKELETON Vallier, L. G., Segall, J. E., Snyder, M. 2002; 53 (4): 251-266

    Abstract

    Successful mating of MATa Saccharomyces cerevisiae cells is dependent on Ste2p, the alpha-factor receptor. Besides receiving the pheromone signal and transducing it through the G-protein coupled MAP kinase pathway, Ste2p is active in the establishment and orientation of the mating projection. We investigated the role of the carboxyl terminus of the receptor in mating projection formation and orientation using a spatial gradient assay. Cells carrying the ste2-T326 mutation, truncating 105 of the 135 amino acids in the receptor tail including a motif necessary for its ligand-mediated internalization, display slow onset of projection formation, abnormal shmoo morphology, and reduced ability to orient the mating projection toward a pheromone source. This reduction was due to the increased loss of mating projection orientation in a pheromone gradient. Cells with a mutated endocytosis motif were defective in reorientation in a pheromone gradient. ste2-Delta296 cells, which carry a complete truncation of the Ste2p tail, exhibit a severe defect in projection formation, and those projections that do form are unable to orient in a pheromone gradient. These results suggest a complex role for the Ste2p carboxy-terminal tail in the formation, orientation, and directional adjustment of the mating projection, and that endocytosis of the receptor is important for this process. In addition, mutations in RSR1/BUD1 and SPA2, genes necessary for budding polarity, exhibited little or no defect in formation or orientation of mating projections. We conclude that mating projection orientation depends upon the carboxyl terminus of the pheromone receptor and not the directional machinery used in budding.

    View details for DOI 10.1002/cm.10073

    View details for Web of Science ID 000179314000001

    View details for PubMedID 12378535

  • A novel mitochondrial protein, Tar1p, is encoded on the antisense strand of the nuclear 25S rDNA GENES & DEVELOPMENT Coelho, P. S., Bryan, A. C., Kumar, A., Shadel, G. S., Snyder, M. 2002; 16 (21): 2755-2760

    Abstract

    In eukaryotes, it is widely assumed that genes coding for proteins and structural RNAs do not overlap. Using a transposon-tagging strategy to globally analyze the Saccharomyces cerevisiae genome for expressed genes, we identified multiple insertions in an open reading frame that is contained fully within and transcribed antisense to the 25S rRNA gene in the nuclear rDNA repeat region on Chromosome XII. Expression of this gene, TAR1 (Transcript Antisense to Ribosomal RNA), can be detected at the RNA and protein levels, and the primary sequence of the corresponding 124-amino-acid protein is conserved in several yeast species. Tar1p was found to localize to mitochondria, and overexpression of the protein suppresses the respiration-deficient petite phenotype of a point mutation in mitochondrial RNA polymerase that affects mitochondrial gene expression and mtDNA stability. These findings indicate that coding information for protein and structural RNAs can overlap, raising issues regarding the coevolution of such complex genes, and also suggest that rDNA transcription and mitochondrial function are coordinately regulated in eukaryotic cells.

    View details for DOI 10.1101/gad.1035002

    View details for Web of Science ID 000179027900004

    View details for PubMedID 12414727

  • A dynamic approach to mapping coordinates between microplates and microarrays JOURNAL OF BIOMEDICAL INFORMATICS Cheung, K. H., Hager, J., Nelson, K., White, K., Li, Y. L., Snyder, M., Williams, K., Miller, P. 2002; 35 (5-6): 306-312

    Abstract

    The retrieval of useful data from spotted microarray slides requires keeping track of which microplate wells and DNA sample corresponds to each spot on each array slide. Existing approaches are closely coupled with the type of arrayer in use and are computer operating-system-specific. To support the microarray researcher community at large who use different arrayers and computer platforms, increased flexibility, generality, and portability of these approaches are required. In this paper, we describe a general algorithm that correlates the well positions of DNA samples in each microplate to the positions of the spots on each array slide. Based on this algorithm, we have implemented a flexible and platform-independent program named MicroArray Convolutor (MAC) that provides a Web solution allowing the user to: (a) import a text file that identifies the DNA samples and their well locations, (b) select a transformation method that converts data in 96-well plate format into 384-well plate format, and (c) specify the output format of the array lists dependant on the configuration of the array platform as well as the downstream analysis software chosen for the array. MAC and its source code can be accessed via the following Web address: http://ymd.med.yale.edu/kei-cgi/kc_mac_dev8.pl.

    View details for DOI 10.1016/S1532-0464(03)00033-9

    View details for Web of Science ID 000184879000004

    View details for PubMedID 12968779

  • Global analysis of gene expression in yeast. Functional & integrative genomics Horak, C. E., Snyder, M. 2002; 2 (4-5): 171-180

    Abstract

    In the past decade, there has been an intense effort to comprehensively catalogue the expressed genes in the yeast Saccharomyces cerevisiae and to determine the absolute and relative abundance of transcript and protein levels under different cellular conditions. Several methods have been developed to monitor gene expression: DNA microarray analysis, Serial Analysis of Gene Expression (SAGE), kinetic RT-PCR and monitoring expression of beta-galactosidase fusion proteins. These techniques have been used to measure transcript and protein abundance in different developmental states and under different environmental stimuli. A wealth of expression data for yeast is now publicly available through several web sites. The expression information that exists has the obvious benefits of providing a better understanding of the gene expression patterns that accompany changes in a yeast cell's environmental and developmental states. This data has also, however, provided clues to unraveling the complicated questions surrounding gene regulation: why and how is gene expression controlled?

    View details for PubMedID 12192590

  • Functional profiling of the Saccharomyces cerevisiae genome NATURE Giaever, G., Chu, A. M., Ni, L., CONNELLY, C., Riles, L., Veronneau, S., Dow, S., Lucau-Danila, A., Anderson, K., Andre, B., Arkin, A. P., Astromoff, A., El Bakkoury, M., Bangham, R., Benito, R., Brachat, S., Campanaro, S., Curtiss, M., Davis, K., Deutschbauer, A., Entian, K. D., Flaherty, P., Foury, F., Garfinkel, D. J., Gerstein, M., Gotte, D., Guldener, U., Hegemann, J. H., Hempel, S., Herman, Z., Jaramillo, D. F., Kelly, D. E., Kelly, S. L., Kotter, P., LaBonte, D., Lamb, D. C., Lan, N., Liang, H., Liao, H., Liu, L., Luo, C. Y., Lussier, M., Mao, R., Menard, P., Ooi, S. L., Revuelta, J. L., Roberts, C. J., Rose, M., Ross-Macdonald, P., Scherens, B., Schimmack, G., Shafer, B., Shoemaker, D. D., Sookhai-Mahadeo, S., Storms, R. K., Strathern, J. N., Valle, G., Voet, M., Volckaert, G., Wang, C. Y., Ward, T. R., Wilhelmy, J., Winzeler, E. A., Yang, Y. H., Yen, G., Youngman, E., Yu, K. X., Bussey, H., Boeke, J. D., Snyder, M., Philippsen, P., Davis, R. W., Johnston, M. 2002; 418 (6896): 387-391

    Abstract

    Determining the effect of gene deletion is a fundamental approach to understanding gene function. Conventional genetic screens exhibit biases, and genes contributing to a phenotype are often missed. We systematically constructed a nearly complete collection of gene-deletion mutants (96% of annotated open reading frames, or ORFs) of the yeast Saccharomyces cerevisiae. DNA sequences dubbed 'molecular bar codes' uniquely identify each strain, enabling their growth to be analysed in parallel and the fitness contribution of each gene to be quantitatively assessed by hybridization to high-density oligonucleotide arrays. We show that previously known and new genes are necessary for optimal growth under six well-studied conditions: high salt, sorbitol, galactose, pH 8, minimal medium and nystatin treatment. Less than 7% of genes that exhibit a significant increase in messenger RNA expression are also required for optimal growth in four of the tested conditions. Our results validate the yeast gene-deletion collection as a valuable resource for functional genomics.

    View details for DOI 10.1038/nature00935

    View details for PubMedID 12140549

  • Microtubule capture by the cleavage apparatus is required for proper spindle positioning in yeast GENES & DEVELOPMENT Kusch, J., Meyer, A., Snyder, M. P., Barral, Y. 2002; 16 (13): 1627-1639

    Abstract

    Cell division is the result of two major cytoskeletal events: partition of the chromatids by the mitotic spindle and cleavage of the cell by the cytokinetic apparatus. Spatial coordination of these events ensures that each daughter cell inherits a nucleus. Here we show that, in budding yeast, capture and shrinkage of astral microtubules at the bud neck is required to position the spindle relative to the cleavage apparatus. Capture required the septins and the microtubule-associated protein Kar9. Like Kar9-defective cells, cells lacking the septin ring failed to position their spindle correctly and showed an increased frequency of nuclear missegregation. Microtubule attachment at the bud neck was followed by shrinkage and a pulling action on the spindle. Enhancement of microtubule shrinkage at the bud neck required the Par-1-related, septin-dependent kinases (SDK) Hsl1 and Gin4. Neither the formin Bnr1 nor the actomyosin contractile ring was required for either microtubule capture or microtubule shrinkage. Together, our results indicate that septins and septin-dependent kinases may coordinate microtubule and actin functions in cell division.

    View details for DOI 10.1101/gad.222602

    View details for Web of Science ID 000176679100004

    View details for PubMedID 12101122

  • Large-scale identification of genes important for apical growth in Saccharomyces cerevisiae by directed allele replacement technology (DART) screening. Functional & integrative genomics Bidlingmaier, S., Snyder, M. 2002; 1 (6): 345-356

    Abstract

    In Saccharomyces cerevisiae, apical bud growth occurs for a brief period in G1 when the deposition of membrane and cell wall is restricted to the tip of the growing bud. To identify genes important for apical bud growth, we have utilized a novel transposon-based mutagenesis system termed DART (Directed Allele Replacement Technology) that allows the rapid transfer of defined insertion alleles into any strain background. A total of 4,810 insertion alleles affecting 1,392 different yeast genes were transferred into a cdc34-2 mutant strain that arrests in the apical growth phase when grown at the restrictive temperature of 37 degrees C. We identified 29 insertion alleles, containing mutations in 17 different genes ( SMY1, SPA2, PAN1, SLA1, SLA2, CBK1, SEC22, FAB1, VPS36, VID22, RAS2, ECM33, OPI3, API1/YDR372c, API2/YDR525w, API3/YKR020w, and API4/YNL051w), which alter the elongated bud morphology of cdc34-2 cells arrested in the apical growth phase. Upon treatment with mating pheromone at 25 degrees C, cells containing insertion alleles affecting ten of these genes ( SMY1, SPA2, PAN1, SLA1, SLA2, CBK1, FAB1, VPS36, VID22, and API2/YDR525w) form abnormal mating projections. Additionally, cells containing insertion alleles affecting SEC22, RAS2, API1/YDR372c, API3/YKR020w,and API4/YNL051display severe mating projection formation defects at the elevated temperature of 37 degrees C. DART mutagenesis has many advantages over traditional mutagenesis methods and will be a useful tool for dissecting gene networks important for biological processes.

    View details for PubMedID 11957109

  • Bud-site selection and cell polarity in budding yeast CURRENT OPINION IN MICROBIOLOGY Casamayor, A., Snyder, M. 2002; 5 (2): 179-186

    Abstract

    Polarized growth involves a hierarchy of events such as selection of the growth site, polarization of the cytoskeleton to the selected growth site, and transport of secretory vesicles containing components required for growth. The budding yeast Saccharomyces cerevisiae is an excellent model system for the study of polarized cell growth. A large number of proteins have been found to be involved in these processes, although their mechanisms of action are not yet well-understood. Recent discoveries have helped elucidate many of the processes involved in cell polarity and bud-site selection in yeast and have modified the traditional view of cellular structures involved in these processes. This review focuses on recent advances on the roles of cortical tags, GTPases and the cytoskeleton in the generation and maintenance of cell polarity in yeast.

    View details for Web of Science ID 000175460500009

    View details for PubMedID 11934615

  • 'Omic' approaches for unraveling signaling networks CURRENT OPINION IN CELL BIOLOGY Zhu, H., Snyder, M. 2002; 14 (2): 173-179

    Abstract

    Signaling pathways are crucial for cell differentiation and response to cellular environments. Recently, a large number of approaches for the global analysis of genes and proteins have been described. These have provided important new insights into the components of different pathways and the molecular and cellular responses of these pathways. This review covers genomic and proteomic (collectively referred to as "omic") approaches for the global analysis of cell signaling, including gene expression profiling and analysis, protein-protein interaction methods, protein microarrays, mass spectroscopy and gene-disruption and engineering approaches.

    View details for DOI 10.1016/S0955-0674(02)00315-0

    View details for Web of Science ID 000174193300007

    View details for PubMedID 11891116

  • Carbohydrate analysis prepares to enter the "omics" era CHEMISTRY & BIOLOGY Bidlingmaier, S., Snyder, M. 2002; 9 (4): 400-401

    Abstract

    In this issue, Houseman and Mrksich describe a carbohydrate array preparation method that can be used to analyze protein-carbohydrate interactions and to characterize the substrate specificity of a carbohydrate-modifying enzyme. Carbohydrate chips were prepared by a novel procedure that allows the covalent attachment of carbohydrate-diene conjugates to a specially engineered monolayer surface. The surface presents a precisely controllable ratio of reactive benzoquinone and inert ethylene glycol groups. Nonspecific adsorption of proteins to the surface is extremely low, and the surface is compatible with popular detection techniques. The immobilization technique was demonstrated to be compatible with recently developed automated solid phase carbohydrate synthesis methods, paving the way for the development of highly complex carbohydrate arrays.

    View details for Web of Science ID 000175379100002

    View details for PubMedID 11983329

  • Subcellular localization of the yeast proteome GENES & DEVELOPMENT Kumar, A., Agarwal, S., Heyman, J. A., Matson, S., Heidtman, M., Piccirillo, S., Umansky, L., Drawid, A., Jansen, R., Liu, Y., Cheung, K. H., Miller, P., Gerstein, M., Roeder, G. S., Snyder, M. 2002; 16 (6): 707-719

    Abstract

    Protein localization data are a valuable information resource helpful in elucidating eukaryotic protein function. Here, we report the first proteome-scale analysis of protein localization within any eukaryote. Using directed topoisomerase I-mediated cloning strategies and genome-wide transposon mutagenesis, we have epitope-tagged 60% of the Saccharomyces cerevisiae proteome. By high-throughput immunolocalization of tagged gene products, we have determined the subcellular localization of 2744 yeast proteins. Extrapolating these data through a computational algorithm employing Bayesian formalism, we define the yeast localizome (the subcellular distribution of all 6100 yeast proteins). We estimate the yeast proteome to encompass approximately 5100 soluble proteins and >1000 transmembrane proteins. Our results indicate that 47% of yeast proteins are cytoplasmic, 13% mitochondrial, 13% exocytic (including proteins of the endoplasmic reticulum and secretory vesicles), and 27% nuclear/nucleolar. A subset of nuclear proteins was further analyzed by immunolocalization using surface-spread preparations of meiotic chromosomes. Of these proteins, 38% were found associated with chromosomal DNA. As determined from phenotypic analyses of nuclear proteins, 34% are essential for spore viability--a percentage nearly twice as great as that observed for the proteome as a whole. In total, this study presents experimentally derived localization data for 955 proteins of previously unknown function: nearly half of all functionally uncharacterized proteins in yeast. To facilitate access to these data, we provide a searchable database featuring 2900 fluorescent micrographs at http://ygac.med.yale.edu.

    View details for DOI 10.1101/gad.970902

    View details for Web of Science ID 000174516500007

    View details for PubMedID 11914276

  • GATA-1 binding sites mapped in the beta-globin locus by using mammalian chlp-chip analysis PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Horak, C. E., Mahajan, M. C., Luscombe, N. M., Gerstein, M., Weissman, S. M., Snyder, M. 2002; 99 (5): 2924-2929

    Abstract

    The expression of the beta-like globin genes is intricately regulated by a series of both general and tissue-restricted transcription factors. The hemapoietic lineage-specific transcription factor GATA-1 is important for erythroid differentiation and has been implicated in regulating the expression of the erythroid-specific genes including the genes of the beta-globin locus. In the human erythroleukemic K562 cell line, only one DNA region has been identified previously as a putative site of GATA-1 interaction by in vivo footprinting studies. We mapped GATA-1 binding throughout the beta-globin locus by using chIp-chip analysis of K562 cells. We found that GATA-1 binds in a region encompassing the HS2 core element, as was previously identified, and an additional region of GATA-1 binding upstream of the gammaG gene. This approach will be of general utility for mapping transcription factor binding sites within the beta-globin locus and throughout the genome.

    View details for DOI 10.1073/pnas.052706999

    View details for Web of Science ID 000174284600061

    View details for PubMedID 11867748

  • A question of size: the eukaryotic proteome and the problems in defining it NUCLEIC ACIDS RESEARCH Harrison, P. M., Kumar, A., Lang, N., Snyder, M., Gerstein, M. 2002; 30 (5): 1083-1090

    Abstract

    We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of existing annotations are being called into question, with these questionable ORFs (qORFs) comprising up to one-fifth of the 'current' proteome. We discuss these in the context of an ideal genome-annotation strategy that considers the proteome as a rigorously defined subset of all possible coding sequences ('the orfome'). (ii) Despite the greater apparent complexity of the fly (more cells, more complex physiology, longer lifespan), the nematode worm appears to have more genes. To explain this, we compare the annotated proteomes of worm and fly, relating to both genome-annotation and genome evolution issues. (iii) The unexpectedly small size of the gene complement estimated for the complete human genome provoked much public debate about the nature of biological complexity. However, in the first instance, for the human genome, the relationship between gene number and proteome size is far from simple. We survey the current estimates for the numbers of human genes and, from this, we estimate a range for the size of the human proteome. The determination of this is substantially hampered by the unknown extent of the cohort of pseudogenes ('dead' genes), in combination with the prevalence of alternative splicing. (Further information relating to yeast is available at http://genecensus.org/yeast/orfome)

    View details for Web of Science ID 000174229900001

    View details for PubMedID 11861898

  • A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution JOURNAL OF MOLECULAR BIOLOGY Harrison, P., Kumar, A., Lan, N., Echols, N., Snyder, M., Gerstein, M. 2002; 316 (3): 409-419

    Abstract

    We surveyed the sequenced Saccharomyces cerevisiae genome (strain S288C) comprehensively for open reading frames (ORFs) that could encode full-length proteins but contain obvious mid-sequence disablements (frameshifts or premature stop codons). These pseudogenic features are termed disabled ORFs (dORFs). Using homology to annotated yeast ORFs and non-yeast proteins plus a simple region extension procedure, we have found 183 dORFs. Combined with the 38 existing annotations for potential dORFs, we have a total pool of up to 221 dORFs, corresponding to less than approximately 3% of the proteome. Additionally, we found 20 pairs of annotated ORFs for yeast that could be merged into a single ORF (termed a mORF) by read-through of the intervening stop codon, and may comprise a complete ORF in other yeast strains. Focussing on a core pool of 98 dORFs with a verifying protein homology, we find that most dORFs are substantially decayed, with approximately 90% having two or more disablements, and approximately 60% having four or more. dORFs are much more yeast-proteome specific than live yeast genes (having about half the chance that they are related to a non-yeast protein). They show a dramatically increased density at the telomeres of chromosomes, relative to genes. A microarray study shows that some dORFs are expressed even though they carry multiple disablements, and thus may be more resistant to nonsense-mediated decay. Many of the dORFs may be involved in responding to environmental stresses, as the largest functional groups include growth inhibition, flocculation, and the SRP/TIP1 family. Our results have important implications for proteome evolution. The characteristics of the dORF population suggest the sorts of genes that are likely to fall in and out of usage (and vary in copy number) in a strain-specific way and highlight the role of subtelomeric regions in engendering this diversity. Our results also have important implications for the effects of the [PSI+] prion. The dORFs disabled by only a single stop and the mORFs (together totalling 35) provide an estimate for the extent of the sequence population that can be resurrected readily through the demonstrated ability of the [PSI+] prion to cause nonsense-codon read-through. Also, the dORFs and mORFs that we find have properties (e.g. growth inhibition, flocculation, vanadate resistance, stress response) that are potentially related to the ability of [PSI+] to engender substantial phenotypic variation in yeast strains under different environmental conditions. (See genecensus.org/pseudogene for further information.)

    View details for DOI 10.1006/jmbi.2001.5343

    View details for Web of Science ID 000174216400001

    View details for PubMedID 11866506

  • An integrated approach for finding overlooked genes in yeast NATURE BIOTECHNOLOGY Kumar, A., Harrison, P. M., Cheung, K. H., Lan, N., Echols, N., Bertone, P., Miller, P., Gerstein, M. B., Snyder, M. 2002; 20 (1): 58-63

    Abstract

    We report here the discovery of 137 previously unappreciated genes in yeast through a widely applicable and highly scalable approach integrating methods of gene-trapping, microarray-based expression analysis, and genome-wide homology searching. Our approach is a multistep process in which expressed sequences are first trapped using a modified transposon that produces protein fusions to beta-galactosidase (beta-gal); non-annotated open reading frames (ORFs) translated as beta-gal chimeras are selected as a candidate pool of potential genes. To verify expression of these sequences, labeled RNA is hybridized against a microarray of oligonucleotides designed to detect gene transcripts in a strand-specific manner. In complement to this experimental method, novel genes are also identified in silico by homology to previously annotated proteins. As these methods are capable of identifying both short ORFs and antisense ORFs, our approach provides an effective supplement to current gene-finding schemes. In total, the genes discovered using this approach constitute 2% of the yeast genome and represent a wealth of overlooked biology.

    View details for Web of Science ID 000173031600037

    View details for PubMedID 11753363

  • Insertional mutagenesis: Transposon-insertion libraries as mutagens in yeast GUIDE TO YEAST GENETICS AND MOLECULAR AND CELL BIOLOGY, PT B Kumar, A., Vidan, S., Snyder, M. 2002; 350: 219-229

    View details for Web of Science ID 000176466300012

    View details for PubMedID 12073314

  • ChIP-chip: A genomic approach for identifying transcription factor binding sites GUIDE TO YEAST GENETICS AND MOLECULAR AND CELL BIOLOGY, PT B Horak, C. E., Snyder, M. 2002; 350: 469-483

    View details for Web of Science ID 000176466300026

    View details for PubMedID 12073330

  • The TRIPLES database: a community resource for yeast molecular biology NUCLEIC ACIDS RESEARCH Kumar, A., Cheung, K. H., Tosches, N., Masiar, P., Liu, Y., Miller, P., Snyder, M. 2002; 30 (1): 73-75

    Abstract

    TRIPLES is a web-accessible database of TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces cerevisiae-a relational database housing nearly half a million data points generated from an ongoing study using large-scale transposon mutagenesis to characterize gene function in yeast. At present, TRIPLES contains three principal data sets (i.e. phenotypic data, protein localization data and expression data) for over 3500 annotated yeast genes as well as several hundred non-annotated open reading frames. In addition, the TRIPLES web site provides online order forms linked to each data set so that users may request any strain or reagent generated from this project free of charge. In response to user requests, the TRIPLES web site has undergone several recent modifications. Our localization data have been supplemented with approximately 500 fluorescent micrographs depicting actual staining patterns observed upon indirect immunofluorescence analysis of indicated epitope-tagged proteins. These localization data, as well as all other data sets within TRIPLES, are now available in full as tab-delimited text. To accommodate increased reagent requests, all orders are now cataloged in a separate database, and users are notified immediately of order receipt and shipment. Also, TRIPLES is one of five sites incorporated into the new functional analysis tool Function Junction provided by the Saccharomyces Genome Database. TRIPLES may be accessed from the Yale Genome Analysis Center (YGAC) homepage at http://ygac.med.yale.edu.

    View details for Web of Science ID 000173077100018

    View details for PubMedID 11752258

  • YMD: A microarray database for large-scale gene expression analysis Annual Symposium of the American-Medical-Informatics-Association Cheung, K. H., White, K., Hager, J., Gerstein, M., Reinke, V., Nelson, K., Masiar, P., Srivastava, R., Li, Y. L., Li, J., Zhao, H. Y., Li, J. M., Allison, D. B., Snyder, M., Miller, P., Williams, K. HANLEY & BELFUS INC MED PUBLISHERS. 2002: 140–144

    Abstract

    The use of microarray technology to perform parallel analysis of the expression pattern of a large number of genes in a single experiment has created a new frontier of medical research. The vast amount of gene expression data generated from multiple microarray experiments requires a robust database system that allows efficient data storage, retrieval, secure access, data dissemination, and integrated data analyses. To address the growing needs of microarray researchers at Yale and their collaborators, we have built the Yale Microarray Database (YMD). YMD is Web-accessible with the following features: (i) a Web program that tracks DNA samples between source plates and arrays, (ii) the capability of finding common genes/clones across different array platforms, (iii) an image file server, (iv) laboratory-based user management and access privileges, (v) project management, (vi) template data entry, (vii) linking gene expression data to annotation databases for functional analysis. YMD is currently being used on a pilot basis by several laboratories for different organisms and array platforms.

    View details for Web of Science ID 000189418100029

    View details for PubMedID 12463803

  • Phosphorylation of gamma-tubulin regulates microtubule organization in budding yeast DEVELOPMENTAL CELL Vogel, J., Drapkin, B., Oomen, J., Beach, D., Bloom, K., Snyder, M. 2001; 1 (5): 621-631

    Abstract

    gamma-Tubulin is essential for microtubule nucleation in yeast and other organisms; whether this protein is regulated in vivo has not been explored. We show that the budding yeast gamma-tubulin (Tub4p) is phosphorylated in vivo. Hyperphosphorylated Tub4p isoforms are restricted to G1. A conserved tyrosine near the carboxy terminus (Tyr445) is required for phosphorylation in vivo. A point mutation, Tyr445 to Asp, causes cells to arrest prior to anaphase. The frequency of new microtubules appearing in the SPB region and the number of microtubules are increased in tub4-Y445D cells, suggesting this mutation promotes microtubule assembly. These data suggest that modification of gamma-tubulin is important for controlling microtubule number, thereby influencing microtubule organization and function during the yeast cell cycle.

    View details for Web of Science ID 000175301700008

    View details for PubMedID 11709183

  • A filamentous growth response mediated by the yeast mating pathway GENETICS Erdman, S., Snyder, M. 2001; 159 (3): 919-928

    Abstract

    Haploid cells of the budding yeast Saccharomyces cerevisiae respond to mating pheromones by arresting their cell-division cycle in G1 and differentiating into a cell type capable of locating and fusing with mating partners. Yeast cells undergo chemotactic cell surface growth when pheromones are present above a threshold level for morphogenesis; however, the morphogenetic responses of cells to levels of pheromone below this threshold have not been systematically explored. Here we show that MATa haploid cells exposed to low levels of the alpha-factor mating pheromone undergo a novel cellular response: cells modulate their division patterns and cell shape, forming colonies composed of filamentous chains of cells. Time-lapse analysis of filament formation shows that its dynamics are distinct from that of pseudohyphal growth; during pheromone-induced filament formation, daughter cells are delayed relative to mother cells with respect to the timing of bud emergence. Filament formation requires the RSR1(BUD1), BUD8, SLK1/BCK1, and SPA2 genes and many elements of the STE11/STE7 MAP kinase pathway; this response is also independent of FAR1, a gene involved in orienting cell polarization during the mating response. We suggest that mating yeast cells undergo a complex response to low levels of pheromone that may enhance the ability of cells to search for mating partners through the modification of cell shape and alteration of cell-division patterns.

    View details for Web of Science ID 000172665800002

    View details for PubMedID 11729141

  • Global analysis of protein activities using proteome chips SCIENCE Zhu, H., Bilgin, M., Bangham, R., Hall, D., Casamayor, A., Bertone, P., Lan, N., Jansen, R., Bidlingmaier, S., Houfek, T., Mitchell, T., Miller, P., Dean, R. A., Gerstein, M., Snyder, M. 2001; 293 (5537): 2101-2105

    Abstract

    To facilitate studies of the yeast proteome, we cloned 5800 open reading frames and overexpressed and purified their corresponding proteins. The proteins were printed onto slides at high spatial density to form a yeast proteome microarray and screened for their ability to interact with proteins and phospholipids. We identified many new calmodulin- and phospholipid-interacting proteins; a common potential binding motif was identified for many of the calmodulin-binding proteins. Thus, microarrays of an entire eukaryotic proteome can be prepared and screened for diverse biochemical activities. The microarrays can also be used to screen protein-drug interactions and to detect posttranslational modifications.

    View details for Web of Science ID 000171028700077

    View details for PubMedID 11474067

  • A genomic study of the bipolar bud site selection pattern in Saccharomyces cerevisiae MOLECULAR BIOLOGY OF THE CELL Ni, L., Snyder, M. 2001; 12 (7): 2147-2170

    Abstract

    A genome-wide screen of 4168 homozygous diploid yeast deletion strains has been performed to identify nonessential genes that participate in the bipolar budding pattern. By examining bud scar patterns representing the sites of previous cell divisions, 127 mutants representing three different phenotypes were found: unipolar, axial-like, and random. From this screen, 11 functional classes of known genes were identified, including those involved in actin-cytoskeleton organization, general bud site selection, cell polarity, vesicular transport, cell wall synthesis, protein modification, transcription, nuclear function, translation, and other functions. Four characterized genes that were not known previously to participate in bud site selection were also found to be important for the haploid axial budding pattern. In addition to known genes, we found 22 novel genes (20 are designated BUD13-BUD32) important for bud site selection. Deletion of one resulted in unipolar budding exclusively from the proximal pole, suggesting that this gene plays an important role in diploid distal budding. Mutations in 20 other novel BUD genes produced a random budding phenotype and one produced an axial-like budding defect. Several of the novel Bud proteins were fused to green fluorescence protein; two proteins were found to localize to sites of polarized cell growth (i.e., the bud tip in small budded cells and the neck in cells undergoing cytokinesis), similar to that postulated for the bipolar signals and proteins that target cell division site tags to their proper location in the cell. Four others localized to the nucleus, suggesting that they play a role in gene expression. The bipolar distal marker Bud8 was localized in a number of mutants; many showed an altered Bud8-green fluorescence protein localization pattern. Through the genome-wide identification and analysis of different mutants involved in bipolar bud site selection, an integrated pathway for this process is presented in which proximal and distal bud site selection tags are synthesized and localized at their appropriate poles, thereby directing growth at those sites. Genome-wide screens of defined collections of mutants hold significant promise for dissecting many biological processes in yeast.

    View details for Web of Science ID 000170350300019

    View details for PubMedID 11452010

  • Genome-wide transposon mutagenesis in yeast. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Kumar, A., Snyder, M. 2001; Chapter 13: Unit13 3-?

    Abstract

    This unit provides comprehensive protocols for the use of insertional libraries generated by shuttle mutagenesis. From the basic protocol, a small aliquot of insertional library DNA may be used to mutagenize yeast, producing strains containing a single transposon insertion within a transcribed and translated region of the genome. This transposon-mutagenized bank of yeast strains may be screened for any desired mutant phenotype. Alternatively, since the transposon contains a reporter gene lacking its start codon and promoter, transposon-tagged strains may also be screened for specific patterns of gene expression. Strains of interest may be characterized by vectorette PCR (protocol provided) in order to locate the precise genomic site of transposon insertion within each mutant. A method by which Cre/lox recombination may be used to reduce the transposon in yeast to a small insertion element encoding an epitope tag is described. This tag serves as a tool by which transposon-mutagenized gene products may be analyzed further (e.g., localized to a discrete subcellular site).

    View details for DOI 10.1002/0471142727.mb1303s51

    View details for PubMedID 18265099

  • Emerging technologies in yeast genomics NATURE REVIEWS GENETICS Kumar, A., Snyder, M. 2001; 2 (4): 302-312

    Abstract

    The genomic revolution is undeniable: in the past year alone, the term 'genomics' was found in nearly 500 research articles, and at least 6 journals are devoted solely to genomic biology. More than just a buzzword, molecular biology has genuinely embraced genomics (the systematic, large-scale study of genomes and their functions). With its facile genetics, the budding yeast Saccharomyces cerevisiae has emerged as an important model organism in the development of many current genomic methodologies. These techniques have greatly influenced the manner in which biology is studied in yeast and in other organisms. In this review, we summarize the most promising technologies in yeast genomics.

    View details for Web of Science ID 000167837900015

    View details for PubMedID 11283702

  • The Cbk1p pathway is important for polarized cell growth and cell separation in Saccharomyces cerevisiae MOLECULAR AND CELLULAR BIOLOGY Bidlingmaier, S., Weiss, E. L., Seidel, C., Drubin, D. G., Snyder, M. 2001; 21 (7): 2449-2462

    Abstract

    During the early stages of budding, cell wall remodeling and polarized secretion are concentrated at the bud tip (apical growth). The CBK1 gene, encoding a putative serine/threonine protein kinase, was identified in a screen designed to isolate mutations that affect apical growth. Analysis of cbk1Delta cells reveals that Cbk1p is required for efficient apical growth, proper mating projection morphology, bipolar bud site selection in diploid cells, and cell separation. Epitope-tagged Cbk1p localizes to both sides of the bud neck in late anaphase, just prior to cell separation. CBK1 and another gene, HYM1, were previously identified in a screen for genes involved in transcriptional repression and proposed to function in the same pathway. Deletion of HYM1 causes phenotypes similar to those observed in cbk1Delta cells and disrupts the bud neck localization of Cbk1p. Whole-genome transcriptional analysis of cbk1Delta suggests that the kinase regulates the expression of a number of genes with cell wall-related functions, including two genes required for efficient cell separation: the chitinase-encoding gene CTS1 and the glucanase-encoding gene SCW11. The Ace2p transcription factor is required for expression of CTS1 and has been shown to physically interact with Cbk1p. Analysis of ace2Delta cells reveals that Ace2p is required for cell separation but not for polarized growth. Our results suggest that Cbk1p and Hym1p function to regulate two distinct cell morphogenesis pathways: an ACE2-independent pathway that is required for efficient apical growth and mating projection formation and an ACE2-dependent pathway that is required for efficient cell separation following cytokinesis. Cbk1p is most closely related to the Neurospora crassa Cot-1; Schizosaccharomyces pombe Orb6; Caenorhabditis elegans, Drosophila, and human Ndr; and Drosophila and mammalian WARTS/LATS kinases. Many Cbk1-related kinases have been shown to regulate cellular morphology.

    View details for Web of Science ID 000167451500019

    View details for PubMedID 11259593

  • Protein arrays and microarrays CURRENT OPINION IN CHEMICAL BIOLOGY Zhu, H., Snyder, M. 2001; 5 (1): 40-45

    Abstract

    In the past, studies of protein activities have focused on studying a single protein at a time, which is often time-consuming and expensive. Recently, with the sequencing of entire genomes, large-scale proteome analysis has begun. Arrays of proteins have been used for the determination of subcellular localization, analysis of protein-protein interactions and biochemical analysis of protein function. New protein-microarray technologies have been introduced that enable the high-throughput analysis of protein activities. These have the potential to revolutionize the analysis of entire proteomes.

    View details for Web of Science ID 000167051500006

    View details for PubMedID 11166646

  • Large-scale mutagenesis: yeast genetics in the genome era CURRENT OPINION IN BIOTECHNOLOGY Vidan, S., Snyder, M. 2001; 12 (1): 28-34

    Abstract

    The completion of the DNA sequence of the budding yeast Saccharomyces cerevisiae resulted in the identification of a large number of genes. However, the function of most of these genes is not known. One of the best ways to determine gene function is to carry out mutational and phenotypic analysis. In recent years, several approaches have been developed for the mutational analysis of yeast genes on a large scale. These include transposon-based insertional mutagenesis, and systematic deletions using PCR-based approaches. These projects have produced collections of yeast strains and plasmid alleles that can be screened using novel approaches. Analysis of these collections by the scientific community promises to reveal a great deal of biological information about this organism.

    View details for Web of Science ID 000167209900005

    View details for PubMedID 11167069

  • A metadata framework for interoperating heterogeneous genome data using XML Annual Symposium of the American-Medical-Informatics-Association (AMIA 2001) Cheung, K. H., Deshpande, A. M., Tosches, N., Nath, S., Agrawal, A., Miller, P., Kumar, A., Snyder, M. BMJ PUBLISHING GROUP. 2001: 110–114

    Abstract

    The rapid advances in the Human Genome Project and genomic technologies have produced massive amounts of data populated in a large number of network-accessible databases. These technological advances and the associated data can have a great impact on biomedicine and healthcare. To answer many of the biologically or medically important questions, researchers often need to integrate data from a number of independent but related genome databases. One common practice is to download data sets (text files) from various genome Web sites and process them by some local programs. One main problem with this approach is that these programs are written on a case-by-case basis because the data sets involved are heterogeneous in structure. To address this problem, we define metadata that maps these heterogeneously structured files into a common eXtensible Markup Language (XML) structure to facilitate data interoperation. We illustrate this approach by interoperating two sets of essential yeast genes that are stored in two yeast genome databases (MIPS and YPD).

    View details for Web of Science ID 000172263400024

    View details for PubMedID 11825164

  • The carboxy terminus of Tub4p is required for gamma-tubulin function in budding yeast JOURNAL OF CELL SCIENCE Vogel, J., Snyder, M. 2000; 113 (21): 3871-3882

    Abstract

    The role of gamma-tubulin in microtubule nucleation is well established, however, its function in other aspects of microtubule organization is unknown. The carboxy termini of alpha/beta-tubulins influence the assembly and stability of microtubules. We investigated the role of the carboxy terminus of yeast gamma-tubulin (Tub4p) in microtubule organization. This region consists of a conserved domain (DSYLD), and acidic tail. Cells expressing truncations lacking the DSYLD domain, tail or both regions are temperature sensitive for growth. Growth defects of tub4 mutants lacking either or both carboxy-terminal domains are suppressed by the microtubule destabilizing drug benomyl. tub4 carboxy-terminal mutants arrest as large budded cells with short bipolar spindles positioned at the bud neck. Electron microscopic analysis of wild-type and CTR mutant cells reveals that SPBs are tightly associated with the bud neck/cortex by cytoplasmic microtubules in mutants lacking the tail region (tub4-delta 444, tub4-delta 448). Mutants lacking the DSYLD residues (tub4-delta 444, tub4-delta DSYLD) form many cytoplasmic microtubules. We propose that the carboxy terminus of Tub4p is required for re-organization of the microtubules upon completion of nuclear migration, and facilitates spindle elongation into the bud.

    View details for Web of Science ID 000165515000019

    View details for PubMedID 11034914

  • Analysis of yeast protein kinases using protein chips NATURE GENETICS Zhu, H., Klemic, J. F., Chang, S., Bertone, P., Casamayor, A., Klemic, K. G., Smith, D., Gerstein, M., Reed, M. A., Snyder, M. 2000; 26 (3): 283-289

    Abstract

    We have developed a novel protein chip technology that allows the high-throughput analysis of biochemical activities, and used this approach to analyse nearly all of the protein kinases from Saccharomyces cerevisiae. Protein chips are disposable arrays of microwells in silicone elastomer sheets placed on top of microscope slides. The high density and small size of the wells allows for high-throughput batch processing and simultaneous analysis of many individual samples. Only small amounts of protein are required. Of 122 known and predicted yeast protein kinases, 119 were overexpressed and analysed using 17 different substrates and protein chips. We found many novel activities and that a large number of protein kinases are capable of phosphorylating tyrosine. The tyrosine phosphorylating enzymes often share common amino acid residues that lie near the catalytic region. Thus, our study identified a number of novel features of protein kinases and demonstrates that protein chip technology is useful for high-throughput screening of protein biochemical activity.

    View details for Web of Science ID 000165176500015

    View details for PubMedID 11062466

  • New antimicrobial flavanones from Physena madagascariensis JOURNAL OF NATURAL PRODUCTS Deng, Y. H., Lee, J. P., Tianasoa-Ramamonjy, M., Snyder, J. K., Des Etages, S. A., Kanada, D., Snyder, M. P., Turner, C. J. 2000; 63 (8): 1082-1089

    Abstract

    Two new flavanones (1 and 2) with antibacterial activity were isolated from the methanolic extract of the dried leaves of Physena madagascariensis using activity against Staphylococcus aureus to guide the isolation. A third flavonoid, a flavanone dimer linked by a methylene group (3) was also isolated and proved to be inactive. The structures of 1 and 2 were established primarily from NMR studies, while that of 3 required more extensive mass spectrometric analysis. All three flavanones had lavandulyl units in the limonene form. Flavanones 1 and 2 were active against several bacteria at concentrations as low as 4 microM.

    View details for DOI 10.1021/np000054m

    View details for Web of Science ID 000089056600009

    View details for PubMedID 10978202

  • Polarized growth controls cell shape and bipolar bud site selection in Saccharomyces cerevisiae MOLECULAR AND CELLULAR BIOLOGY Sheu, Y. J., Barral, Y., Snyder, M. 2000; 20 (14): 5235-5247

    Abstract

    We examined the relationship between polarized growth and division site selection, two fundamental processes important for proper development of eukaryotes. Diploid Saccharomyces cerevisiae cells exhibit an ellipsoidal shape and a specific division pattern (a bipolar budding pattern). We found that the polarity genes SPA2, PEA2, BUD6, and BNI1 participate in a crucial step of bud morphogenesis, apical growth. Deleting these genes results in round cells and diminishes bud elongation in mutants that exhibit pronounced apical growth. Examination of distribution of the polarized secretion marker Sec4 demonstrates that spa2Delta, pea2Delta, bud6Delta, and bni1Delta mutants fail to concentrate Sec4 at the bud tip during apical growth and at the division site during repolarization just prior to cytokinesis. Moreover, cell surface expansion is not confined to the distal tip of the bud in these mutants. In addition, we found that the p21-activated kinase homologue Ste20 is also important for both apical growth and bipolar bud site selection. We further examined how the duration of polarized growth affects bipolar bud site selection by using mutations in cell cycle regulators that control the timing of growth phases. The grr1Delta mutation enhances apical growth by stabilizing G(1) cyclins and increases the distal-pole budding in diploids. Prolonging polarized growth phases by disrupting the G(2)/M cyclin gene CLB2 enhances the accuracy of bud site selection in wild-type, spa2Delta, and ste20Delta cells, whereas shortening the polarized growth phases by deleting SWE1 decreases the fidelity of bipolar budding. This study reports the identification of components required for apical growth and demonstrates the critical role of polarized growth in bipolar bud site selection. We propose that apical growth and repolarization at the site of cytokinesis are crucial for establishing spatial cues used by diploid yeast cells to position division planes.

    View details for Web of Science ID 000087820000027

    View details for PubMedID 10866679

  • The Kar3p kinesin-related protein forms a novel heterodimeric structure with its associated protein Cik1p MOLECULAR BIOLOGY OF THE CELL Barrett, J. G., Manning, B. D., Snyder, M. 2000; 11 (7): 2373-2385

    Abstract

    Proteins that physically associate with members of the kinesin superfamily are critical for the functional diversity observed for these microtubule motor proteins. However, quaternary structures of complexes between kinesins and kinesin-associated proteins are poorly defined. We have analyzed the nature of the interaction between the Kar3 motor protein, a minus-end-directed kinesin from yeast, and its associated protein Cik1. Extraction experiments demonstrate that Kar3p and Cik1p are tightly associated. Mapping of the interaction domains of the two proteins by two-hybrid analyses indicates that Kar3p and Cik1p associate in a highly specific manner along the lengths of their respective coiled-coil domains. Sucrose gradient velocity centrifugation and gel filtration experiments were used to determine the size of the Kar3-Cik1 complex from both mating pheromone-treated cells and vegetatively growing cells. These experiments predict a size for this complex that is consistent with that of a heterodimer containing one Kar3p subunit and one Cik1p subunit. Finally, immunoprecipitation of epitope-tagged and untagged proteins confirms that only one subunit of Kar3p and Cik1p are present in the Kar3-Cik1 complex. These findings demonstrate that the Kar3-Cik1 complex has a novel heterodimeric structure not observed previously for kinesin complexes.

    View details for Web of Science ID 000088184800016

    View details for PubMedID 10888675

  • Drivers and passengers wanted! The role of kinesin-associated proteins TRENDS IN CELL BIOLOGY Manning, B. D., Snyder, M. 2000; 10 (7): 281-289

    Abstract

    Members of the kinesin superfamily of proteins participate in a wide variety of cellular processes. Although much attention has been devoted to the structural and biophysical properties of the force-generating motor domain of kinesins, the factors controlling the functional specificity of each kinesin have only recently been examined. Genetic and biochemical approaches have identified two classes of proteins that associate physically with the diverse non-motor domains of kinesins. These proteins can be divided into two general classes: first, those that form tight complexes with the kinesin and are instrumental in directing the distinct function of the motor (i.e. drivers) and, second, those proteins that might transiently interact with the motor or be an integral part of the motor's cargo (i.e. passengers). Here, we discuss known kinesin-binding proteins, and how they might participate in the activity of their motor partners.

    View details for Web of Science ID 000087769300004

    View details for PubMedID 10856931

  • Genome-wide mutant collections: toolboxes for functional genomics CURRENT OPINION IN MICROBIOLOGY Coelho, P. S., Kumar, A., Snyder, M. 2000; 3 (3): 309-315

    Abstract

    The sequencing of entire genomes has led to the identification of many genes. A future challenge will be to determine the function of all of the genes of an organism. One of the best ways to ascertain function is to disrupt genes and determine the phenotype of the resulting organism. Novel large-scale approaches for generating gene disruptions and analyzing the resulting phenotype are underway in the budding yeast Saccharomyces cerevisiae and other organisms including flies, Mycoplasma, worms, plants and mice. These approaches and mutant collections will be extremely valuable to the scientific community and will dramatically alter the manner in which science is performed in the future.

    View details for Web of Science ID 000087635200015

    View details for PubMedID 10851164

  • An integrated web interface for large-scale characterization of sequence data. Functional & integrative genomics Cheung, K. H., Kumar, A., Snyder, M., Miller, P. 2000; 1 (1): 70-75

    Abstract

    Large-scale genome projects require the analysis of large amounts of raw data. This analysis often involves the application of a chain of biology-based programs. Many of these programs are difficult to operate because they are non-integrated, command-line driven, and platform-dependent. The problem is compounded when the number of data files involved is large, making navigation and status-tracking difficult. To demonstrate how this problem can be addressed, we have created a platform-independent Web front end that integrates a set of programs used in a genomic project analyzing gene function by transposon mutagenesis in Saccharomyces cerevisiae. In particular, these programs help define a large number of transposon insertion events within the yeast genome, identifying both the precise site of transposon insertion as well as potential open reading frames disrupted by this insertion event. Our Web interface facilitates this analysis by performing the following tasks. Firstly, it allows each of the analysis programs to be launched against multiple directories of data files. Secondly, it allows the user to view, download, and upload files generated by the programs. Thirdly, it indicates which sets of data directories have been processed by each program. Although designed specifically to aid in this project, our interface exemplifies a general approach by which independent software programs may be integrated into an efficient protocol for large-scale genomic data processing.

    View details for PubMedID 11793223

  • Compartmentalization of the cell cortex by septins is required for maintenance of cell polarity in yeast MOLECULAR CELL Barral, Y., Mermall, V., Mooseker, M. S., Snyder, M. 2000; 5 (5): 841-851

    Abstract

    Formation and maintenance of specialized plasma membrane domains are crucial for many biological processes, such as cell polarization and signaling. During isotropic bud growth, the yeast cell periphery is divided into two domains: the bud surface, an active site of exocytosis and growth, and the relatively quiescent surface of the mother cell. We found that cells lacking septins at the bud neck failed to maintain the exocytosis and morphogenesis factors Spa2, Sec3, Sec5, and Myo2 in the bud during isotropic growth. Furthermore, we found that septins were required for proper regulation of actin patch stability; septin-defective cells permitted to enter isotropic growth lost actin and growth polarity. We propose that septins maintain cell polarity by specifying a boundary between cortical domains.

    View details for Web of Science ID 000087332500008

    View details for PubMedID 10882120

  • Regulation of cytokinesis by the Elm1 protein kinase in Saccharomyces cerevisiae JOURNAL OF CELL SCIENCE BOUQUIN, N., Barral, Y., Courbeyrette, R., Blondel, M., Snyder, M., Mann, C. 2000; 113 (8): 1435-1445

    Abstract

    A Saccharomyces cerevisiae mutant unable to grow in a cdc28-1N background was isolated and shown to be affected in the ELM1 gene. Elm1 is a protein kinase, thought to be a negative regulator of pseudo-hyphal growth. We show that Cdc11, one of the septins, is delocalised in the mutant, indicating that septin localisation is partly controlled by Elm1. Moreover, we show that cytokinesis is delayed in an elm1delta mutant. Elm1 levels peak at the end of the cell cycle and Elm1 is localised at the bud neck in a septin-dependent fashion from bud emergence until the completion of anaphase, at about the time of cell division. Genetic and biochemical evidence suggest that Elm1 and the three other septin-localised protein kinases, Hsl1, Gin4 and Kcc4, work in parallel pathways to regulate septin behaviour and cytokinesis. In addition, the elm1delta;) morphological defects can be suppressed by deletion of the SWE1 gene, but not the cytokinesis defect nor the septin mislocalisation. Our results indicate that cytokinesis in budding yeast is regulated by Elm1.

    View details for Web of Science ID 000086855200012

    View details for PubMedID 10725226

  • Sbe2p and Sbe22p, two homologous Golgi proteins involved in yeast cell wall formation MOLECULAR BIOLOGY OF THE CELL Santos, B., Snyder, M. 2000; 11 (2): 435-452

    Abstract

    The cell wall of fungal cells is important for cell integrity and cell morphogenesis and protects against harmful environmental conditions. The yeast cell wall is a complex structure consisting mainly of mannoproteins, glucan, and chitin. The molecular mechanisms by which the cell wall components are synthesized and transported to the cell surface are poorly understood. We have identified and characterized two homologous yeast proteins, Sbe2p and Sbe22p, through their suppression of a chs5 spa2 mutant strain defective in chitin synthesis and cell morphogenesis. Although sbe2 and sbe22 null mutants are viable, sbe2 sbe22 cells display several phenotypes indicative of defects in cell integrity and cell wall structure. First, sbe2 sbe22 cells display a sorbitol-remediable lysis defect at 37 degrees C and are hypersensitive to SDS and calcofluor. Second, electron microscopic analysis reveals that sbe2 sbe22 cells have an aberrant cell wall structure with a reduced mannoprotein layer. Finally, immunofluorescence experiments reveal that in small-budded cells, sbe2 sbe22 mutants mislocalize Chs3p, a protein involved in chitin synthesis. In addition, sbe2 sbe22 diploids have a bud-site selection defect, displaying a random budding pattern. A Sbe2p-GFP fusion protein localizes to cytoplasmic patches, and Sbe2p cofractionates with Golgi proteins. Deletion of CHS5, which encodes a Golgi protein involved in the transport of Chs3p to the cell periphery, is lethal in combination with disruption of SBE2 and SBE22. Thus, we suggest a model in which Sbe2p and Sbe22p are involved in the transport of cell wall components from the Golgi apparatus to the cell surface periphery in a pathway independent of Chs5p.

    View details for Web of Science ID 000085478500003

    View details for PubMedID 10679005

  • Mutagenesis of murine cytomegalovirus using a Tn3-based transposon VIROLOGY Zhan, X. Y., Lee, M., Abenes, G., Von Reis, I., Kittinunvorakoon, C., Ross-Macdonald, P., Snyder, M., Liu, F. Y. 2000; 266 (2): 264-274

    Abstract

    A transposon derived from Escherichia coli Tn3 was introduced into the genome of murine cytomegalovirus (MCMV) to generate a pool of viral mutants. We analyzed three of the constructed recombinant viruses that contained the transposon within the M25, M27, and m155 open reading frames. Our studies provide the first direct evidence to suggest that M25 and M27 are not essential for viral replication in mouse NIH 3T3 cells. Studies in cultured cells and Balb/c mice indicated that the transposon insertion is stable during viral propagation both in vitro and in vivo. Moreover the virus that contained the insertion mutation in M25 exhibited a titer similar to that of the wild-type virus in the salivary glands, lungs, livers, spleens, and kidneys of the Balb/c mice that were intraperitoneally infected with these viruses. These results suggest that M25 is dispensable for viral growth in these organs and the presence of the transposon sequence in the viral genome does not significantly affect viral replication in vivo. The Tn3-based system can be used as a mutagenesis approach for studying the function of MCMV genes in both tissue culture and in animals.

    View details for Web of Science ID 000085018400005

    View details for PubMedID 10639313

  • Graphically-enabled integration of bioinformatics tools allowing parallel execution Annual Symposium of the American-Medical-Informatics-Association Cheung, K. H., Miller, P., Sherman, A., Weston, S., Stratmann, E., Schultz, M., Snyder, M., Kumar, A. HANLEY & BELFUS INC. 2000: 141–145

    Abstract

    Rapid analysis of large amounts of genomic data is of great biological as well as medical interest. This type of analysis will greatly benefit from the ability to rapidly assemble a set of related analysis programs and to exploit the power of parallel computing. TurboGenomics, which is a software package currently in its alpha-testing phase, allows integration of heterogeneous software components to be done graphically. In addition, the tool is capable of making the integrated components run in parallel. To demonstrate these abilities, we use the tool to develop a Web-based application that allows integrated access to a set of large-scale sequence data analysis programs used by a transposon-insertion based yeast genome project. We also contrast the differences in building such an application with and without using the TurboGenomics software.

    View details for Web of Science ID 000170207500030

    View details for PubMedID 11079861

  • TRIPLES: a database of gene function in Saccharomyces cerevisiae NUCLEIC ACIDS RESEARCH Kumar, A., Cheung, K. H., Ross-Macdonald, P., Coelho, P. S., Miller, P., Snyder, M. 2000; 28 (1): 81-84

    Abstract

    Using a novel multipurpose mini-transposon, we have generated a collection of defined mutant alleles for the analysis of disruption phenotypes, protein localization, and gene expression in Saccharomyces cerevisiae. To catalog this unique data set, we have developed TRIPLES, a Web-accessible database of TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces. Encompassing over 250 000 data points, TRIPLES provides convenient access to information from nearly 7800 transposon-mutagenized yeast strains; within TRIPLES, complete data reports of each strain may be viewed in table format, or if desired, downloaded as tab-delimited text files. Each report contains external links to corresponding entries within the Saccharomyces Genome Database and International Nucleic Acid Sequence Data Library (GenBank). Unlike other yeast databases, TRIPLES also provides on-line order forms linked to each clone report; users may immediately request any desired strain free-of-charge by submitting a completed form. In addition to presenting a wealth of information for over 2300 open reading frames, TRIPLES constitutes an important medium for the distribution of useful reagents throughout the yeast scientific community. Maintained by the Yale Genome Analysis Center, TRIPLES may be accessed at http://ycmi.med.yale.edu/ygac/triples.htm

    View details for Web of Science ID 000084896300021

    View details for PubMedID 10592187

  • gamma-tubulin of budding yeast CENTROSOME IN CELL REPLICATION AND EARLY DEVELOPMENT Vogel, J., Snyder, M. 2000; 49: 75-104

    View details for Web of Science ID 000165501500004

    View details for PubMedID 11005015

  • High-throughput methods for the large-scale analysis of gene function by transposon tagging APPLICATIONS OF CHIMERIC GENES AND HYBRID PROTEINS, PT C Kumar, A., Des Etages, S. A., Coelho, P. S., Roeder, G. S., Snyder, M. 2000; 328: 550-574

    View details for Web of Science ID 000166565300033

    View details for PubMedID 11075366

  • Large-scale analysis of the yeast genome by transposon tagging and gene disruption NATURE Ross-Macdonald, P., Coelho, P. S., Roemer, T., Agarwal, S., Kumar, A., Jansen, R., Cheung, K. H., Sheehan, A., Symoniatis, D., Umansky, L., Heldtman, M., Nelson, F. K., Iwasaki, H., Hager, K., Gerstein, M., Miller, P., Roeder, G. S., Snyder, M. 1999; 402 (6760): 413-418

    Abstract

    Economical methods by which gene function may be analysed on a genomic scale are relatively scarce. To fill this need, we have developed a transposon-tagging strategy for the genome-wide analysis of disruption phenotypes, gene expression and protein localization, and have applied this method to the large-scale analysis of gene function in the budding yeast Saccharomyces cerevisiae. Here we present the largest collection of defined yeast mutants ever generated within a single genetic background--a collection of over 11,000 strains, each carrying a transposon inserted within a region of the genome expressed during vegetative growth and/or sporulation. These insertions affect nearly 2,000 annotated genes, representing about one-third of the 6,200 predicted genes in the yeast genome. We have used this collection to determine disruption phenotypes for nearly 8,000 strains using 20 different growth conditions; the resulting data sets were clustered to identify groups of functionally related genes. We have also identified over 300 previously non-annotated open reading frames and analysed by indirect immunofluorescence over 1,300 transposon-tagged proteins. In total, our study encompasses over 260,000 data points, constituting the largest functional analysis of the yeast genome ever undertaken.

    View details for Web of Science ID 000083913600057

    View details for PubMedID 10586881

  • Rationale and design of the National Emphysema Treatment Trial (NETT): A prospective randomized trial of lung volume reduction surgery JOURNAL OF THORACIC AND CARDIOVASCULAR SURGERY Rodarte, J., Miller, C., Barnard, P., Carter, J., DuBose, K., Flanigan, T., Fox, P., Haddad, J., Hale, K., Hood, E., Jahn, A., King, K., Nguyen, C., Norman, S., Officer, T., Reardon, M., RICKETTS, J., Sax, S., Tucker, M., Williams, K., Reilly, J., Sugarbaker, D., Fanning, C., Birkenmaier, K., Body, S., Catanzano, C., Duffy, S., Formanek, V., Fuhlbrigge, A., Hartigan, P., Hunsaker, A., Jacobson, F., Mark, L., Russell, R., Saunders, D., Simons, G., Swanson, S., McKenna, R., Mohsenifar, Z., Geaga, C., Aberle, D., Brown, J., Clark, S., Cooper, C., Ferrill, R., Frantz, R., Gelb, A., Goldin, J., Gordon, J., Head, D., Joyner, M., Julien, P., Levine, M., Lewis, M., Pendio, M., Silverman, J., Walker, P., Williams, B., Yegyan, V., Yoou, C., Maurer, J., DeCamp, M., Meli, Y., Aviv, L., Hearn, C., Kraenzler, E., Marlow, S., McCarthy, K., Mehta, A., Meziane, M., O'Donovan, P., Schilz, R., Sullivan, E., Ginsburg, M., Scharf, S., Jellen, P., Asegu, A., Austin, J., Bartels, M., Berkman, Y., Berkoski, P., Brogan, F., Delphin, E., Demercado, G., DiMango, A., DePrisco, L., Gonzales, J., Gotthelf, J., Herman, P., Khan, A., Mantinaos, M., McKeon, K., Mets, B., Pearson, G., Pfeffer, J., Rossoff, L., Sunshine, A., Simonelli, P., Stavrolakes, K., Thomashow, B., Vilotijevic, D., Yip, C., MacIntyre, N., Davis, R. D., Howe, J., Crouch, R., Grichnik, K., Harpole, D., Krichman, A., Lawlor, B., McAdams, H., Norten, J., Rinaldo-Gallo, S., Steele, M., Tapson, V., Hubmayr, R., Deschamps, C., Bartling, S., Aughenbaugh, G., Bradt, K., Edgar, M., Elliott, B., Edell, E., GARRETT, J., Hanson, K., Hanson, L., Harms, G., Hartman, T., Kalra, S., Karsell, P., Midthun, D., Miller, D., Mottram, C., Odenbrett, K., Swensen, S., Sykes, A. M., Torres, N., Utz, J., Cherniack, R., Make, B., Gilmartin, M., Buquor, B., Canterbury, J., Carlos, M., Chetham, P., Fernandez, E., Geyman, L., Lynch, D., Newell, J., Pomerantz, M., Raymond, C., Safilian, B., Tolliver, R., Whalen-Price, J., Winner, K., Zamora, M., Diaz, P., Ross, P., Kelsey, M., Dinant, S., King, M., Harter, R., Mikelinich, E., Rittenger, D., Shaffer, S., Naunheim, K., Keller, C., Osterloh, J., Alvarez, F., Borosh, S., Bowen, C., Frese, S., Glockner, J., Heiberg, E., Hibbett, A., Kleinhenz, M. E., McCain, D., Ruppel, G., Turnage, W. S., Criner, G., Furukawa, S., Kuzma, A. M., Barnette, R., Boiselle, P., Brester, N., D'Alonzo, G., Gilmartin, M., Keresztury, M., Kish, L., Lautensack, K., Leonard, E., Leyenson, V., Lorenzon, M., O'Brien, G., O'Grady, T., Rising, P., Schartel, S., Travaline, J., Ries, A., Kaplan, R., Ramirez, C., Brewer, N., Colt, H., Crawford, S., Frankville, D., Friedman, P., Johnson, J., Kapelanski, D., Larsen, C., Limberg, T., Magliocca, M., Olson, L., Papatheofanis, F. J., Prewitt, L., Resnikoff, P., Sassi-Dambron, D., Krasna, M., Orens, J., Moskowitz, I., Altemus, M., Bochicchio, D., Britt, E. J., Cook, L., Fessler, H., Gaetani, D., Gheorghiu, I., Gilbert, T., Hasnain, J., Kearney, A., Kim, S., King, K., Markus, S., Miller, N., Schneider, R., Shade, D., Silver, K., Smith, K., Turner, C., Weir, C., Wheeler, J., White, C., Martinez, F., Iannettoni, M., Meldrum, C., Alexander, J., Bria, W., Campbell, K., Christensen, P., Foss, C., Gill, P., Kazanjian, P., Kazerooni, E., Knieper, V., Lowenbergh, N., Meldrum, M., Miller, R., Ojo, T., Piergentili, D., Poole, L., Quint, L., Rysso, P., Spear, M., True, M., Woodcock, B., Kaiser, L., Hansen-Flaschen, J., Wurster, A., Alavi, A., Alcorn, T., Aronchick, J., Arcasoy, S., Aukberg, S., Benedict, B., Craemer, S., Edelman, J., Gefter, W., Kotler-Klein, L., Kotloff, R., Manaker, S., Mendez, J., Miller, W., Miller, W., Palevsky, H., Russell, W., Simcox, R., Snedeker, S., Tino, G., Keenan, R., Sciurba, F., George, E., Ayres, G., Bauldoff, G., Brown, M., Costello, P., Donahoe, M., Fuhrman, C., Hoffman, R., Holbert, M., Johnson, P., Kopp, T., Lacomis, J., Sexton, J., Silfies, L., Slivka, W., Strollo, D., Sullivan, E., Tullock, W., Benditt, J., Wood, D., Snyder, M., Anable, K., Battaglia, N., Boitano, L., Bowdle, A., Chan, L., Chwalik, C., Culver, B., Godwin, D., Golden, S., Ibrahim, A., Lockhart, D., Marglin, S., McDowell, P., Nellum, K., Van Norman, G., Bosco, L., Chiang, Y. P., Clancy, C., Handelsman, H., PIANTADOSI, S., Tonascia, J., Belt, P., Collins, K., Collison, B., Dawson, C., Dawson, D., Donithan, M., Edmonds, V., Harle, J., Jackson, R., Lee, S., Levine, C., Meinert, J., Nowakowski, D., Reshef, D., Smith, M., Simon, B., Sternberg, A., Van Natta, M., Wise, R., Kaplan, R. M., Chiang, Y. P., Fahs, M. C., Fendrick, A. M., Moskowitz, A. J., Pathak, D., Ramsey, S. D., Richter, E., Schwartz, J. S., Sheingold, S., Shroyer, A. L., Wagner, J., Yusen, R., Waldhausen, J., Bernard, G., deMets, D., Hoover, E., Levine, R., Mahler, D., McSweeney, A. J., Wiener-Kronish, J., Williams, O. D., Younes, M., Sheingold, S., McVearry, K., Mone, C., Proctor-Young, J., Fishman, A. P., Weinmann, G., Deshler, J., Albert, P., Hurd, S., Kiley, J., Wu, M. 1999; 118 (3): 518-528

    View details for Web of Science ID 000082468800023

    View details for PubMedID 10469970

  • Functional characterization of the S-cerevisiae genome by gene deletion and parallel analysis SCIENCE Winzeler, E. A., Shoemaker, D. D., Astromoff, A., Liang, H., Anderson, K., Andre, B., Bangham, R., Benito, R., Boeke, J. D., Bussey, H., Chu, A. M., CONNELLY, C., Davis, K., Dietrich, F., Dow, S. W., El Bakkoury, M., Foury, F., Friend, S. H., Gentalen, E., Giaever, G., Hegemann, J. H., Jones, T., Laub, M., Liao, H., Liebundguth, N., Lockhart, D. J., Lucau-Danila, A., Lussier, M., M'Rabet, N., Menard, P., Mittmann, M., Pai, C., Rebischung, C., Revuelta, J. L., Riles, L., Roberts, C. J., Ross-Macdonald, P., Scherens, B., Snyder, M., Sookhai-Mahadeo, S., Storms, R. K., Veronneau, S., Voet, M., Volckaert, G., Ward, T. R., Wysocki, R., Yen, G. S., Yu, K. X., Zimmermann, K., Philippsen, P., Johnston, M., Davis, R. W. 1999; 285 (5429): 901-906

    Abstract

    The functions of many open reading frames (ORFs) identified in genome-sequencing projects are unknown. New, whole-genome approaches are required to systematically determine their function. A total of 6925 Saccharomyces cerevisiae strains were constructed, by a high-throughput strategy, each with a precise deletion of one of 2026 ORFs (more than one-third of the ORFs in the genome). Of the deleted ORFs, 17 percent were essential for viability in rich medium. The phenotypes of more than 500 deletion strains were assayed in parallel. Of the deletion strains, 40 percent showed quantitative growth defects in either rich or minimal medium.

    View details for Web of Science ID 000081860900053

    View details for PubMedID 10436161

  • Differential regulation of the Kar3p kinesin-related protein by two associated proteins, Cik1p and Vik1p JOURNAL OF CELL BIOLOGY Manning, B. D., Barrett, J. G., Wallace, J. A., Granok, H., Snyder, M. 1999; 144 (6): 1219-1233

    Abstract

    The mechanisms by which kinesin-related proteins interact with other proteins to carry out specific cellular processes is poorly understood. The kinesin-related protein, Kar3p, has been implicated in many microtubule functions in yeast. Some of these functions require interaction with the Cik1 protein (Page, B.D., L.L. Satterwhite, M.D. Rose, and M. Snyder. 1994. J. Cell Biol. 124:507-519). We have identified a Saccharomyces cerevisiae gene, named VIK1, encoding a protein with sequence and structural similarity to Cik1p. The Vik1 protein is detected in vegetatively growing cells but not in mating pheromone-treated cells. Vik1p physically associates with Kar3p in a complex separate from that of the Kar3p-Cik1p complex. Vik1p localizes to the spindle-pole body region in a Kar3p-dependent manner. Reciprocally, concentration of Kar3p at the spindle poles during vegetative growth requires the presence of Vik1p, but not Cik1p. Phenotypic analysis suggests that Cik1p and Vik1p are involved in different Kar3p functions. Disruption of VIK1 causes increased resistance to the microtubule depolymerizing drug benomyl and partially suppresses growth defects of cik1Delta mutants. The vik1Delta and kar3Delta mutations, but not cik1Delta, partially suppresses the temperature-sensitive growth defect of strains lacking the function of two other yeast kinesin-related proteins, Cin8p and Kip1p. Our results indicate that Kar3p forms functionally distinct complexes with Cik1p and Vik1p to participate in different microtubule-mediated events within the same cell.

    View details for Web of Science ID 000079470900011

    View details for PubMedID 10087265

  • SHC1, a high pH inducible gene required for growth at alkaline pH in Saccharomyces cerevisiae BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS Hong, S. K., Han, S. B., Snyder, M., Choi, E. Y. 1999; 255 (1): 116-122

    Abstract

    In this study, we carried out a large-scale transposon tagging screening to identify genes whose expression is regulated by ambient pH. Of 35,000 transformants, two strains carrying the genes whose expression is strictly dependent on pH of growth medium were identified. One of the genes with 20-fold induction by alkali pH was identified as SHC1 gene in the Yeast Genome Directory and its expression was the highest at alkaline pH and moderately induced by osmotic stress. However, the gene was expressed neither at acidic pH nor by other stress conditions. The haploid mutant with truncated shc1 gene showed growth retardation and an abnormal morphology at alkaline pH. On the other hand, the mutant strain carrying the wild-type SHC1 gene reverted to the mutant phenotype. To confirm that Shc1p is an alkali-inducible protein, a monoclonal antibody to Shc1p was produced. While a 55-kDa protein band appeared on the Western blot of cells grown at alkaline pH, Shc1p was barely detectable on the blots of cells grown in YPD. Our results indicate that yeast cells have an efficient system adapting to large variations in ambient pH and SHC1 is one of the genes required for the growth at alkaline pH.

    View details for Web of Science ID 000078599700021

    View details for PubMedID 10082665

  • Nim1-related kinases coordinate cell cycle progression with the organization of the peripheral cytoskeleton in yeast GENES & DEVELOPMENT Barral, Y., Parra, M., Bidlingmaier, S., Snyder, M. 1999; 13 (2): 176-187

    Abstract

    The mechanisms that couple cell cycle progression with the organization of the peripheral cytoskeleton are poorly understood. In Saccharomyces cerevisiae, the Swe1 protein has been shown previously to phosphorylate and inactivate the cyclin-dependent kinase, Cdc28, thereby delaying the onset of mitosis. The nim1-related protein kinase, Hsl1, induces entry into mitosis by negatively regulating Swe1. We have found that Hsl1 physically associates with the septin cytoskeleton in vivo and that Hsl1 kinase activity depends on proper septin function. Genetic analysis indicates that two additional Hsl1-related kinases, Kcc4 and Gin4, act redundantly with Hsl1 to regulate Swe1. Kcc4, like Hsl1 and Gin4, was found to localize to the bud neck in a septin-dependent fashion. Interestingly, hsl1 kcc4 gin4 triple mutants develop a cellular morphology extremely similar to that of septin mutants. Consistent with the idea that Hsl1, Kcc4, and Gin4 link entry into mitosis to proper septin organization, we find that septin mutants incubated at the restrictive temperature trigger a Swe1-dependent mitotic delay that is necessary to maintain cell viability. These results reveal for the first time how cells monitor the organization of their cytoskeleton and demonstrate the existence of a cell cycle checkpoint that responds to defects in the peripheral cytoskeleton. Moreover, Hsl1, Kcc4, and Gin4 have homologs in higher eukaryotes, suggesting that the regulation of Swe1/Wee1 by this class of kinases is highly conserved.

    View details for Web of Science ID 000078395100007

    View details for PubMedID 9925642

  • Transposon mutagenesis for the analysis of protein production, function, and localization CDNA PREPARATION AND CHARACTERIZATION Ross-Macdonald, P., Sheehan, A., Friddle, C., Roeder, G. S., Snyder, M. 1999; 303: 512-532

    View details for Web of Science ID 000081913000029

    View details for PubMedID 10349663

  • Spa2p interacts with cell polarity proteins and signaling components involved in yeast cell morphogenesis MOLECULAR AND CELLULAR BIOLOGY Sheu, Y. J., Santos, B., Fortin, N., Costigan, C., Snyder, M. 1998; 18 (7): 4053-4069

    Abstract

    The yeast protein Spa2p localizes to growth sites and is important for polarized morphogenesis during budding, mating, and pseudohyphal growth. To better understand the role of Spa2p in polarized growth, we analyzed regions of the protein important for its function and proteins that interact with Spa2p. Spa2p interacts with Pea2p and Bud6p (Aip3p) as determined by the two-hybrid system; all of these proteins exhibit similar localization patterns, and spa2Delta, pea2Delta, and bud6Delta mutants display similar phenotypes, suggesting that these three proteins are involved in the same biological processes. Coimmunoprecipitation experiments demonstrate that Spa2p and Pea2p are tightly associated with each other in vivo. Velocity sedimentation experiments suggest that a significant portion of Spa2p, Pea2p, and Bud6p cosediment, raising the possibility that these proteins form a large, 12S multiprotein complex. Bud6p has been shown previously to interact with actin, suggesting that the 12S complex functions to regulate the actin cytoskeleton. Deletion analysis revealed that multiple regions of Spa2p are involved in its localization to growth sites. One of the regions involved in Spa2p stability and localization interacts with Pea2p; this region contains a conserved domain, SHD-II. Although a portion of Spa2p is sufficient for localization of itself and Pea2p to growth sites, only the full-length protein is capable of complementing spa2 mutant defects, suggesting that other regions are required for Spa2p function. By using the two-hybrid system, Spa2p and Bud6p were also found to interact with components of two mitogen-activated protein kinase (MAPK) pathways important for polarized cell growth. Spa2p interacts with Ste11p (MAPK kinase [MEK] kinase) and Ste7p (MEK) of the mating signaling pathway as well as with the MEKs Mkk1p and Mkk2p of the Slt2p (Mpk1p) MAPK pathway; for both Mkk1p and Ste7p, the Spa2p-interacting region was mapped to the N-terminal putative regulatory domain. Bud6p interacts with Ste11p. The MEK-interacting region of Spa2p corresponds to the highly conserved SHD-I domain, which is shown to be important for mating and MAPK signaling. spa2 mutants exhibit reduced levels of pheromone signaling and an elevated level of Slt2p kinase activity. We thus propose that Spa2p, Pea2p, and Bud6p function together, perhaps as a complex, to promote polarized morphogenesis through regulation of the actin cytoskeleton and signaling pathways.

    View details for Web of Science ID 000074380100044

    View details for PubMedID 9632790

  • Ursodiol prophylaxis against hepatic complications of allogeneic bone marrow transplantation - A randomized, double-blind, placebo-controlled trial ANNALS OF INTERNAL MEDICINE Essell, J. H., Schroeder, M. T., Harman, G. S., Halvorson, R., Lew, V., Callander, N., Snyder, M., Lewis, S. K., Allerton, J. P., Thompson, J. M. 1998; 128 (12): 975-?

    Abstract

    Hepatic complications are a major cause of illness and death after bone marrow transplantation.To confirm the results of a pilot study that indicated that ursodiol prophylaxis could reduce the incidence of veno-occlusive disease of the liver.Randomized, double-blind, placebo-controlled study.Tertiary care teaching hospital.67 consecutive patients undergoing transplantation with allogeneic bone marrow (donated by a relative) in whom busulfan plus cyclophosphamide was used as the preparative regimen and cyclosporine plus methotrexate was used to prevent graft-versus-host disease.Before the preparative regimen was started, patients were randomly assigned to receive ursodiol, 300 mg twice daily (or 300 mg in the morning and 600 mg in the evening if body weight was > 90 kg), or placebo.Patients were prospectively evaluated for the clinical diagnosis of veno-occlusive disease, the occurrence of acute graft-versus-host disease, and survival.The incidence of veno-occlusive disease was 40% (13 of 32 patients) in placebo recipients and 15% (5 of 34 patients) in ursodiol recipients (P = 0.03). Assignment to placebo was the only pretransplantation characteristic that predicted the development of veno-occlusive disease. The most significant predictor of 100-day mortality was the diagnosis of veno-occlusive disease. The difference in actuarial risk for hematologic relapse in patients with chronic myelogenous leukemia and nonhepatic toxicities between the two groups was not statistically significant (13% in the ursodiol group and 20% in the placebo group; P > 0.2).Ursodiol prophylaxis seemed to decrease the incidence of hepatic complications after allogeneic bone marrow transplantation in patients who received a preparative regimen with busulfan plus cyclophosphamide.

    View details for Web of Science ID 000074201300002

    View details for PubMedID 9625683

  • Pheromone-regulated genes required for yeast mating differentiation JOURNAL OF CELL BIOLOGY Erdman, S., Lin, L., Malczynski, M., Snyder, M. 1998; 140 (3): 461-483

    Abstract

    Yeast cells mate by an inducible pathway that involves agglutination, mating projection formation, cell fusion, and nuclear fusion. To obtain insight into the mating differentiation of Saccharomyces cerevisiae, we carried out a large-scale transposon tagging screen to identify genes whose expression is regulated by mating pheromone. 91,200 transformants containing random lacZ insertions were screened for beta-galactosidase (beta-gal) expression in the presence and absence of alpha factor, and 189 strains containing pheromone-regulated lacZ insertions were identified. Transposon insertion alleles corresponding to 20 genes that are novel or had not previously been known to be pheromone regulated were examined for effects on the mating process. Mutations in four novel genes, FIG1, FIG2, KAR5/ FIG3, and FIG4 were found to cause mating defects. Three of the proteins encoded by these genes, Fig1p, Fig2p, and Fig4p, are dispensible for cell polarization in uniform concentrations of mating pheromone, but are required for normal cell polarization in mating mixtures, conditions that involve cell-cell communication. Fig1p and Fig2p are also important for cell fusion and conjugation bridge shape, respectively. The fourth protein, Kar5p/Fig3p, is required for nuclear fusion. Fig1p and Fig2p are likely to act at the cell surface as Fig1:: beta-gal and Fig2::beta-gal fusion proteins localize to the periphery of mating cells. Fig4p is a member of a family of eukaryotic proteins that contain a domain homologous to the yeast Sac1p. Our results indicate that a variety of novel genes are expressed specifically during mating differentiation to mediate proper cell morphogenesis, cell fusion, and other steps of the mating process.

    View details for Web of Science ID 000072026300002

    View details for PubMedID 9456310

  • The Spa2-related protein, Sph1p, is important for polarized growth in yeast JOURNAL OF CELL SCIENCE Roemer, T., Vallier, L., Sheu, Y. J., Snyder, M. 1998; 111: 479-494

    Abstract

    The Saccharomyces cerevisiae protein Sph1p is both structurally and functionally related to the polarity protein, Spa2p. Sph1p and Spa2p are predicted to share three 100-amino acid domains each exceeding 30% sequence identity, and the amino-terminal domain of each protein contains a direct repeat common to Homo sapiens and Caenorhabditis elegans protein sequences. sph1- and spa2-deleted cells possess defects in mating projection morphology and pseudohyphal growth. sph1(Delta) spa2(Delta) double mutants also exhibit a strong haploid invasive growth defect and an exacerbated mating projection defect relative to either sph1(Delta) or spa2(Delta) single mutants. Consistent with a role in polarized growth, Sph1p localizes to growth sites in a cell cycle-dependent manner: Sph1p concentrates as a cortical patch at the presumptive bud site in unbudded cells, at the tip of small, medium and large buds, and at the bud neck prior to cytokinesis. In pheromone-treated cells, Sph1p localizes to the tip of the mating projection. Proper localization of Sph1p to sites of active growth during budding and mating requires Spa2p. Sph1p interacts in the two-hybrid system with three mitogen-activated protein (MAP) kinase kinases (MAPKKs): Mkk1p and Mkk2p, which function in the cell wall integrity/cell polarization MAP kinase pathway, and Ste7p, which operates in the pheromone and pseudohyphal signaling response pathways. Sph1p also interacts weakly with STE11, the MAPKKK known to activate STE7. Moreover, two-hybrid interactions between SPH1 and STE7 and STE11 occur independently of STE5, a proposed scaffolding protein which interacts with several members of this MAP kinase module. We speculate that Spa2p and Sph1p may function during pseudohyphal and haploid invasive growth to help tether this MAP kinase module to sites of polarized growth. Our results indicate that Spa2p and Sph1p comprise two related proteins important for the control of cell morphogenesis in yeast.

    View details for Web of Science ID 000072336900007

    View details for PubMedID 9443897

  • Transposon tagging I: A novel system for monitoring protein production, function and localization YEAST GENE ANALYSIS Ross-Macdonald, P., Sheehan, A., Friddle, C., Roeder, G. S., Snyder, M. 1998; 26: 161-179
  • Cell polarity and morphogenesis in budding yeast ANNUAL REVIEW OF MICROBIOLOGY Madden, K., Snyder, M. 1998; 52: 687-744

    Abstract

    Eukaryotic cells respond to intracellular and extracellular cues to direct asymmetric cell growth and division. The yeast Saccharomyces cerevisiae undergoes polarized growth at several times during budding and mating and is a useful model organism for studying asymmetric growth and division. In recent years, many regulatory and cytoskeletal components important for directing and executing growth have been identified, and molecular mechanisms have been elucidated in yeast. Key signaling pathways that regulate polarization during the cell cycle and mating response have been described. Since many of the components important for polarized cell growth are conserved in other organisms, the basic mechanisms mediating polarized cell growth are likely to be universal among eukaryotes.

    View details for Web of Science ID 000076541000021

    View details for PubMedID 9891811

  • The Rho-GEF Rom2p localizes to sites of polarized cell growth and participates in cytoskeletal functions in Saccharomyces cerevisiae MOLECULAR BIOLOGY OF THE CELL Manning, B. D., Padmanabha, R., Snyder, M. 1997; 8 (10): 1829-1844

    Abstract

    Rom2p is a GDP/GTP exchange factor for Rho1p and Rho2p GTPases; Rho proteins have been implicated in control of actin cytoskeletal rearrangements. ROM2 and RHO2 were identified in a screen for high-copy number suppressors of cik1 delta, a mutant defective in microtubule-based processes in Saccharomyces cerevisiae. A Rom2p::3XHA fusion protein localizes to sites of polarized cell growth, including incipient bud sites, tips of small buds, and tips of mating projections. Disruption of ROM2 results in temperature-sensitive growth defects at 11 degrees C and 37 degrees C. rom2 delta cells exhibit morphological defects. At permissive temperatures, rom2 delta cells often form elongated buds and fail to form normal mating projections after exposure to pheromone; at the restrictive temperature, small budded cells accumulate. High-copy number plasmids containing either ROM2 or RHO2 suppress the temperature-sensitive growth defects of cik1 delta and kar3 delta strains. KAR3 encodes a kinesin-related protein that interacts with Cik1p. Furthermore, rom2 delta strains exhibit increased sensitivity to the microtubule depolymerizing drug benomyl. These results suggest a role for Rom2p in both polarized morphogenesis and functions of the microtubule cytoskeleton.

    View details for Web of Science ID A1997YB66300001

    View details for PubMedID 9348527

  • Human dishevelled genes constitute a DHR-containing multigene family GENOMICS Semenov, M. V., Snyder, M. 1997; 42 (2): 302-310

    Abstract

    Three human genes encoding proteins homologous to Drosophila Dishevelled protein were cloned and characterized. Amino acid similarity between the different Dishevelled proteins is concentrated in three highly conserved regions. Two of these regions do not exhibit significant sequence similarity with other known proteins; the third is similar to the discs-large homology region, which was first found in a Drosophila Discs-large tumor suppressor protein (also known as GLGF or PDZ domain). We produced antibodies against human Dishevelled-2 and demonstrated that it is a phosphoprotein and can be detected in all cell lines and human embryonic tissues examined. Indirect immunofluorescence indicates that it is found throughout the cytoplasm. Our results indicate that the human dishevelled genes constitute a multigene family and that Dishevelled proteins are highly conserved among metazoans.

    View details for Web of Science ID A1997XE86000015

    View details for PubMedID 9192851

  • SBF cell cycle regulator as a target of the yeast PKC-MAP kinase pathway SCIENCE Madden, K., Sheu, Y. J., Baetz, K., Andrews, B., Snyder, M. 1997; 275 (5307): 1781-1784

    Abstract

    Protein kinase C (PKC) signaling is highly conserved among eukaryotes and has been implicated in the regulation of cellular processes such as cell proliferation and growth. In the budding yeast, PKC1 functions to activate the SLT2(MPK1) mitogen-activated protein (MAP) kinase cascade, which is required for the maintenance of cell integrity during asymmetric cell growth. Genetic studies, coimmunoprecipitation experiments, and analysis of protein phosphorylation in vivo and in vitro indicate that the SBF transcription factor (composed of Swi4p and Swi6p), an important regulator of gene expression at the G1 to S phase cell cycle transition, is a target of the Slt2p(Mpk1p) MAP kinase. These studies provide evidence for a direct role of the PKC1 pathway in the regulation of the yeast cell cycle and cell growth and indicate that conserved signaling pathways can act to control key regulators of cell division.

    View details for Web of Science ID A1997WP05600038

    View details for PubMedID 9065400

  • Targeting of chitin synthase 3 to polarized growth sites in yeast requires Chs5p and Myo2p JOURNAL OF CELL BIOLOGY Santos, B., Snyder, M. 1997; 136 (1): 95-110

    Abstract

    Chitin is an essential structural component of the yeast cell wall whose deposition is regulated throughout the yeast life cycle. The temporal and spatial regulation of chitin synthesis was investigated during vegetative growth and mating of Saccharomyces cerevisiae by localization of the putative catalytic subunit of chitin synthase III, Chs3p, and its regulator, Chs5p. Immunolocalization of epitope-tagged Chs3p revealed a novel localization pattern that is cell cycle-dependent. Chs3p is polarized as a diffuse ring at the incipient bud site and at the neck between the mother and bud in small-budded cells; it is not found at the neck in large-budded cells containing a single nucleus. In large-budded cells undergoing cytokinesis, it reappears as a ring at the neck. In cells responding to mating pheromone, Chs3p is found throughout the projection. The appearance of Chs3p at cortical sites correlates with times that chitin synthesis is expected to occur. In addition to its localization at the incipient bud site and neck, Chs3p is also found in cytoplasmic patches in cells at different stages of the cell cycle. Epitope-tagged Chs5p also localizes to cytoplasmic patches; these patches contain Kex2p, a late Golgi-associated enzyme. Unlike Chs3p, Chs5p does not accumulate at the incipient bud site or neck. Nearly all Chs3p patches contain Chs5p, whereas some Chs5p patches lack detectable Chs3p. In the absence of Chs5p, Chs3p localizes in cytoplasmic patches, but it is no longer found at the neck or the incipient bud site, indicating that Chs5p is required for the polarization of Chs3p. Furthermore, Chs5p localization is not affected either by temperature shift or by the myo2-66 mutation, however, Chs3p polarization is affected by temperature shift and myo2-66. We suggest a model in which Chs3p polarization to cortical sites in yeast is dependent on both Chs5p and the actin cytoskeleton/Myo2p.

    View details for Web of Science ID A1997WC96100009

    View details for PubMedID 9008706

  • A multipurpose transposon system for analyzing protein production, localization, and function in Saccharomyces cerevisiae PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA ROSSMACDONALD, P., Sheehan, A., Roeder, G. S., Snyder, M. 1997; 94 (1): 190-195

    Abstract

    Analysis of the function of a particular gene product typically involves determining the expression profile of the gene, the subcellular location of the protein, and the phenotype of a null strain lacking the protein. Conditional alleles of the gene are often created as an additional tool. We have developed a multifunctional, transposon-based system that simultaneously generates constructs for all the above analyses and is suitable for mutagenesis of any given Saccharomyces cerevisiae gene. Depending on the transposon used, the yeast gene is fused to a coding region for beta-galactosidase or green fluorescent protein. Gene expression can therefore be monitored by chemical or fluorescence assays. The transposons create insertion mutations in the target gene, allowing phenotypic analysis. The transposon can be reduced by cre-lox site-specific recombination to a smaller element that leaves an epitope tag inserted in the encoded protein. In addition to its utility for a variety of immunodetection purposes, the epitope tag element also has the potential to create conditional alleles of the target gene. We demonstrate these features of the transposons by mutagenesis of the SPA2, ARP100, SER1, and BDF1 genes.

    View details for Web of Science ID A1997WC34700036

    View details for PubMedID 8990184

  • Selection of polarized growth sites in yeast TRENDS IN CELL BIOLOGY Roemer, T., Vallier, L. G., Snyder, M. 1996; 6 (11): 434-441

    Abstract

    The budding yeast Saccharomyces cerevisiae responds to intracellular and extracellular cues to direct cell growth. Genetic analysis has revealed many components that participate in this process and has provided insight into the mechanisms by which these proteins function. Several of these components, such as the septins, pheromone receptors and GTPase proteins, have homologues in multicellular eukaryotes, suggesting that many aspects of polarized cell growth may be conserved throughout evolution. This review discusses our current understanding of the molecular mechanisms of growth-site selection during the different stages of the yeast life cycle.

    View details for Web of Science ID A1996VT35300007

    View details for PubMedID 15157515

  • DNA gyrase and topoisomerase IV on the bacterial chromosome: Quinolone-induced DNA cleavage JOURNAL OF MOLECULAR BIOLOGY Chen, C. R., Malik, M., Snyder, M., Drlica, K. 1996; 258 (4): 627-637

    Abstract

    DNA gyrase, the bacterial enzyme that supercoils DNA, is trapped on chromosomal DNA by the 4-quinolone compounds, as drug-gyrase complexes that contain DNA breaks. Examination of chromosomal DNA extracted from Escherichia coli indicated that bacteriostatic concentrations of oxolinic acid trap gyrase and block DNA synthesis without releasing broken DNA from gyrase-DNA complexes. Release, detected as free rotation of DNA in the presence of an intercalating dye, occurred only at high, bactericidal oxolinic acid concentrations. Release of DNA breaks and cell death were both blocked by chloramphenicol, an inhibitor of protein synthesis, suggesting that synthesis of additional protein activity is required to free the DNA ends. Ciprofloxacin, a more potent quinolone, released DNA breaks and killed cells even in the presence of chloramphenicol. It is proposed that this second, chloramphenicol-insensitive mode for release of DNA breaks and cell killing arises from dissociation of gyrase subunits. Ciprofloxacin also killed a gyrase (gyrA) mutant resistant to the prototype of quinolone, nalidixic acid, and created complexes on DNA detected by DNA fragmentation. This lethal effect of ciprofloxacin was eliminated by additional mutations mapping in parC, one of the two genes encoding topoisomerase IV. Thus, the fluoroquinolone compounds have two intracellular targets. In the absence of the gyrA mutation, the parC (CipR) allele did not by itself confer resistance to ciprofloxacin, indicating that gyrase is the major quinolone target in E. coli. These findings provide a molecular explanation for quinolone action in bacteria and a new way to study topoisomerase IV-chromosome interactions.

    View details for Web of Science ID A1996UK55800009

    View details for PubMedID 8636997

  • Selection of axial growth sites in yeast requires Axl2p, a novel plasma membrane glycoprotein GENES & DEVELOPMENT Roemer, T., Madden, K., Chang, J. T., Snyder, M. 1996; 10 (7): 777-793

    Abstract

    Spa2p and Cdc10p both participate in bud site selection and cell morphogenesis in yeast, and spa2delta cdc10-10 cells are inviable. To identify additional components important for these processes in yeast, a colony-sectoring assay was used to isolate high-copy suppressors of the spa2delda cdc10-10 lethality. One such gene, AXL2, has been characterized in detail. axl2 cells are defective in bud site selection in haploid cells and bud in a bipolar fashion. Genetic analysis indicates that AXL2 falls into the same epistasis group as BUD3. Axl2p is predicted to be a type I transmembrane protein. Tunicamycin treatment experiments, biochemical fractionation and extraction experiments, and proteinase K protection experiments collectively indicate that Axl2p is an integral membrane glycoprotein at the plasma membrane. Indirect immunofluorescence experiments using either Axl2p tagged with three copies of a hemagglutinin epitope or high-copy AXL2 and anti-Axl2p antibodies reveal a unique localization pattern for Axl2p. The protein is present as a patch at the incipient bud site and in emerging buds, and at the bud periphery in small-budded cells. In cells containing medium-sized or large buds, Axl2p is located as a ring at the neck. Thus, Axl2p is a novel membrane protein critical for selecting proper growth sites in yeast. We suggest that Axl2p acts as an anchor in the plasma membrane that helps direct new growth components and/or polarity establishment components to the cortical axial budding site.

    View details for Web of Science ID A1996UE55800001

    View details for PubMedID 8846915

  • Target gene identification: Target specific transcriptional activation by three murine homeodomain VP16 hybrid proteins in Saccharomyces cerevisiae JOURNAL OF EXPERIMENTAL ZOOLOGY FriedmanEinat, M., Einat, P., Snyder, M., Ruddle, F. 1996; 274 (3): 145-156

    Abstract

    The mammalian homeodomain proteins encoded by Hox genes play an important role in embryonic development by providing positional queues which define developmental identities along the anteroposterior axis of developing organisms. These proteins bind DNA specifically through their homeodomain to sequences containing ATTA cores, and thereby are thought to exert their effect regulating downstream genes. Little is known about the specificity of binding of homeodomain proteins to their sequences and the identity of their target genes. We have developed a transcriptional activation assay in yeast which employs a homeobox/VP16 fusion gene as a transcriptional activator and a target construct in which test fragments of DNA are inserted upstream to a reporter gene. Using this assay, we compared transcriptional activation by three chimeric proteins containing the homeodomains of the mouse homeobox genes, Hoxa-5, Hoxb-6, and Hoxc-8. When tested on previously defined target sequences, strong differential specificities of activation were observed. In an effort to identify enhancers that normally respond to homeodomain transcriptional activators, random fragments of mouse genomic DNA were cloned upstream of the reporter gene. Genomic DNA fragments with distinct activation profiles were obtained and were found to share matches beyond the ATTA core with previously described enhancers. These results demonstrate that the transcriptional activation system in yeast can be used as a convenient system to detect DNA motifs which bind homeodomain proteins, and subsequently, to identify authentic target genes responsive to Hox gene proteins.

    View details for Web of Science ID A1996UC28400001

    View details for PubMedID 8882492

  • Highly divergent gamma-tubulin gene is essential for cell growth and proper microtubule organization in Saccharomyces cerevisiae JOURNAL OF CELL BIOLOGY Sobel, S. G., Snyder, M. 1995; 131 (6): 1775-1788

    Abstract

    A Saccharomyces cerevisiae gamma-tubulin-related gene, TUB4, has been characterized. The predicted amino acid sequence of the Tub4 protein (Tub4p) is 29-38% identical to members of the gamma-tubulin family. Indirect immunofluorescence experiments using a strain containing an epitope-tagged Tub4p indicate that Tub4p resides at the spindle pole body throughout the yeast cell cycle. Deletion of the TUB4 gene indicates that Tub4p is essential for yeast cell growth. Tub4p-depleted cells arrest during nuclear division; most arrested cells contain a large bud, replicated DNA, and a single nucleus. Immunofluorescence and nuclear staining experiments indicate that cells depleted of Tub4p contain defects in the organization of both cytoplasmic and nuclear microtubule arrays; such cells exhibit nuclear migration failure, defects in spindle formation, and/or aberrantly long cytoplasmic microtubule arrays. These data indicate that the S. cerevisiae gamma-tubulin protein is an important SPB component that organizes both cytoplasmic and nuclear microtubule arrays.

    View details for Web of Science ID A1995TN76000011

    View details for PubMedID 8557744

  • Nuclear pore complex clustering and nuclear accumulation of poly(A)(+) RNA associated with mutation of the Saccharomyces cerevisiae RAT2/NUP120 gene JOURNAL OF CELL BIOLOGY Heath, C. V., Copeland, C. S., Amberg, D. C., DELPRIORE, V., Snyder, M., Cole, C. N. 1995; 131 (6): 1677-1697

    Abstract

    To identify genes involved in the export of messenger RNA from the nucleus to the cytoplasm, we used an in situ hybridization assay to screen temperature-sensitive strains of Saccharomyces cerevisiae. This identified those which accumulated poly(A)+ RNA in their nuclei when shifted to the non-permissive temperature of 37 degrees C. We describe here the properties of yeast strains carrying mutations in the RAT2 gene (RAT - ribonucleic acid trafficking) and the cloning of the RAT2 gene. Only a low percentage of cells carrying the rat2-1 allele showed nuclear accumulation of poly(A)+ RNA when cultured at 15 degrees or 23 degrees C, but within 4 h of a shift to the nonpermissive temperature of 37 degrees C, poly(A)+ RNA accumulated within the nuclei of approximately 80% of cells. No defect was seen in the nuclear import of a reporter protein bearing a nuclear localization signal. Nuclear pore complexes (NPCs) are distributed relatively evenly around the nuclear envelope in wild-type cells. In cells carrying either the rat2-1 or rat2-2 allele, NPCs were clustered together into one or a few regions of the nuclear envelope. This clustering was a constitutive property of mutant cells. NPCs remained clustered in crude nuclei isolated from mutant cells, indicating that these clusters are not able to redistribute around the nuclear envelope when nuclei are separated from cytoplasmic components. Electron microscopy revealed that these clusters were frequently found in a protuberance of the nuclear envelope and were often located close to the spindle pole body. The RAT2 gene encodes a 120-kD protein without similarity to other known proteins. It was essential for growth only at 37 degrees C, but the growth defect at high temperature could be suppressed by growth of mutant cells in the presence of high osmolarity media containing 1.0 M sorbitol or 0.9 M NaCl. The phenotypes seen in cells carrying a disruption of the RAT2 gene were very similar to those seen with the rat2-1 and rat2-2 alleles. Epitope tagging was used to show that Rat2p is located at the nuclear periphery and co-localizes with yeast NPC proteins recognized by the RL1 monoclonal antibody. The rat2-1 allele was synthetically lethal with both the rat3-1/nup133-1 and rat7-1/nup159-1 alleles. These results indicate that the product of this gene is a nucleoporin which we refer to as Rat2p/Nup120p.

    View details for Web of Science ID A1995TN76000004

    View details for PubMedID 8557737

  • MOLECULAR-BASIS OF CELL INTEGRITY AND MORPHOGENESIS IN SACCHAROMYCES-CEREVISIAE MICROBIOLOGICAL REVIEWS Cid, V. J., Duran, A., DELREY, F., Snyder, M. P., Nombela, C., Sanchez, M. 1995; 59 (3): 345-386

    Abstract

    In fungi and many other organisms, a thick outer cell wall is responsible for determining the shape of the cell and for maintaining its integrity. The budding yeast Saccharomyces cerevisiae has been a useful model organism for the study of cell wall synthesis, and over the past few decades, many aspects of the composition, structure, and enzymology of the cell wall have been elucidated. The cell wall of budding yeasts is a complex and dynamic structure; its arrangement alters as the cell grows, and its composition changes in response to different environmental conditions and at different times during the yeast life cycle. In the past few years, we have witnessed a profilic genetic and molecular characterization of some key aspects of cell wall polymer synthesis and hydrolysis in the budding yeast. Furthermore, this organism has been the target of numerous recent studies on the topic of morphogenesis, which have had an enormous impact on our understanding of the intracellular events that participate in directed cell wall synthesis. A number of components that direct polarized secretion, including those involved in assembly and organization of the actin cytoskeleton, secretory pathways, and a series of novel signal transduction systems and regulatory components have been identified. Analysis of these different components has suggested pathways by which polarized secretion is directed and controlled. Our aim is to offer an overall view of the current understanding of cell wall dynamics and of the complex network that controls polarized growth at particular stages of the budding yeast cell cycle and life cycle.

    View details for Web of Science ID A1995RT80800002

    View details for PubMedID 7565410

  • MUTATION OR DELETION OF THE SACCHAROMYCES-CEREVISIAE RAT3/NUP133 GENE CAUSES TEMPERATURE-DEPENDENT NUCLEAR ACCUMULATION OF POLY(A)(+) RNA AND CONSTITUTIVE CLUSTERING OF NUCLEAR-PORE COMPLEXES MOLECULAR BIOLOGY OF THE CELL Li, O., Heath, C. V., Amberg, D. C., Dockendorff, T. C., Copeland, C. S., Snyder, M., Cole, C. N. 1995; 6 (4): 401-417

    Abstract

    To identify genes whose products play potential roles in the nucleocytoplasmic export of messenger RNA, we isolated temperature-sensitive strains of Saccharomyces cerevisiae and examined them by fluorescent in situ hybridization. With the use of a digoxigen-tagged oligo-(dT)50 probe, we identified those that showed nuclear accumulation of poly(A)+ RNA when cells were shifted to the nonpermissive temperature. We describe here the properties of yeast strains bearing the rat3-1 mutation (RAT-ribonucleic acid trafficking) and the cloning of the RAT3 gene. When cultured at the permissive temperature of 23 degrees C, fewer than 10% of cells carrying the rat3-1 allele showed nuclear accumulation of poly(A)+ RNA, whereas approximately 70% showed nuclear accumulation of poly(A)+ RNA, whereas approximately 70% showed nuclear accumulation of poly(A)+ RNA after a shift to 37 degrees C for 4 h. In wild-type cells, nuclear pore complexes (NPCs) are distributed relatively evenly around the nuclear envelope. Both indirect immunofluorescence analysis and electron microscopy of rat3-1 cells indicated that NPCs were clustered into one or a few regions of the NE in mutant cells. Similar NPC clustering was seen in mutant cells cultured at temperatures between 15 degrees C and 37 degrees C. The RAT3 gene encodes an 1157-amino acid protein without similarity to other known proteins. It is essential for growth only at 37 degrees C. Cells carrying a disruption of the RAT3 gene were very similar to cells carrying the original rat3-1 mutation; they showed temperature-dependent nuclear accumulation of poly(A)+ RNA and exhibited constitutive clustering of NPCs. Epitope tagging of Rat3p demonstrated that it is located at the nuclear periphery and co-localizes with nuclear pore proteins recognized by the RL1 monoclonal antibody. We refer to this nucleoporin as Rat3p/Nup133p.

    View details for Web of Science ID A1995QW44600004

    View details for PubMedID 7626806

  • 2 SHORT AUTOEPITOPES ON THE NUCLEAR DOT ANTIGEN ARE SIMILAR TO EPITOPES ENCODED BY THE EPSTEIN-BARR-VIRUS PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA XIE, K. W., Snyder, M. 1995; 92 (5): 1639-1643

    Abstract

    To understand the relationship between antibodies present in patients with anti-nuclear dot (ND) autoimmune disease and the proteins they recognize, epitopes that react with the autoantibodies were mapped. A panel of fusion proteins containing different portions of the ND protein were overproduced in Escherichia coli. Immunoblot analysis with anti-ND antibodies revealed that most (10 of 12) sera recognize two major autoepitopes that are each a maximum of 8 amino acids long. The other two sera recognize one of the two epitopes. In addition to the short linear autoepitopes, a conformational epitope appears to be present on the ND antigen. Each of the two linear epitope sequences shares sequence similarities with those of several viral proteins found in the databases. Furthermore, two fusion proteins containing short Epstein-Barr virus (EBV) protein sequences that are similar to the ND epitopes were recognized by the human autoimmune sera, indicating that the autoepitopes are present in EBV protein sequences. Our results are consistent with the hypothesis that ND autoimmune disease might be associated with EBV infections.

    View details for Web of Science ID A1995QK07700081

    View details for PubMedID 7878031

  • Methods for large-scale analysis of gene expression, protein localization, and disruption phenotypes in Saccharomyces cerevisiae METHODS IN MOLECULAR AND CELLULAR BIOLOGY ROSSMACDONALD, P., Burns, N., Malczynski, M., Sheehan, A., Roeder, S., Snyder, M. 1995; 5 (5): 298-308
  • THE SPINDLE POLE BODY OF YEAST CHROMOSOMA Snyder, M. 1994; 103 (6): 369-380

    Abstract

    Microtubule organizing centers play an essential cellular role in nucleating microtubule assembly and establishing the microtubule array. The microtubule organizing center of yeast, the spindle pole body (SPB), shares many functions and properties with those other organisms. In recent years considerable new information has been generated concerning components associated with the SPB, and the mechanism by which it duplicates. This article reviews our current view of the cytology and molecular composition of the SPB of the budding yeast, Saccharomyces cerevisiae, and the fission yeast, Schizosaccharomyces pombe. Genetic studies in these organisms has revealed information about how the SPB duplicates and separates, and its roles during vegetative growth, mating and meiosis.

    View details for Web of Science ID A1994PT81500001

    View details for PubMedID 7859557

  • SLK1, A YEAST HOMOLOG OF MAP KINASE ACTIVATORS, HAS A RAS/CAMP INDEPENDENT ROLE IN NUTRIENT SENSING MOLECULAR & GENERAL GENETICS Costigan, C., Snyder, M. 1994; 243 (3): 286-296

    Abstract

    The Saccharomyces cerevisiae SLK1 protein is implicated in nutrient sensing and growth control. Under nutrient-limiting conditions, slk1 mutants fail to undergo cell cycle arrest. The role of the SLK1 protein in nutrient sensing was examined with respect to the cAMP-dependent protein kinase (PKA) pathway, which has a well characterized role in growth control in yeast, and by the analysis of dominant SLK1 alleles that affect the nutrient response of wild-type cells. Interactions with the PKA pathway were examined by phenotypic analysis of double mutants of slk1 and various PKA pathway mutants. Combining the slk1-delta mutation with a mutation that is thought constitutively activate the PKA pathway, pde2, resulted in enhanced growth control defects. The combination of slk1-delta with mutations that inhibit the PKA pathway, cdc25 and ras1, ras2, failed to alleviate the slk1 cell cycle arrest defect and lowered the permissive temperature for growth. Furthermore bcy1 tpk1 tpk2 tpk3w (bcy1 tpkw) mutants, which have constitutive, low-level, cAMP-independent kinase activity, exhibit nutrient sensing, which is eliminated in the slk1 bcy1 tpkw mutants. These results implicated SLK1 in PKA-independent growth control in yeast. The amino-terminal, noncatalytic region of the SLK1 protein may be important in the regulation of SLK1 function in growth control. Overexpression of this region caused starvation sensitivity in wild-type cells by interfering with SLK1 protein function.

    View details for Web of Science ID A1994NM20300005

    View details for PubMedID 8190082

  • LARGE-SCALE ANALYSIS OF GENE-EXPRESSION, PROTEIN LOCALIZATION, AND GENE DISRUPTION SACCHAROMYCES-CEREVISIAE GENES & DEVELOPMENT Burns, N., Grimwade, B., ROSSMACDONALD, P. B., Choi, E. Y., Finberg, K., Roeder, G. S., Snyder, M. 1994; 8 (9): 1087-1105

    Abstract

    We have developed a large-scale screen to identify genes expressed at different times during the life cycle of Saccharomyces cerevisiae and to determine the subcellular locations of many of the encoded gene products. Diploid yeast strains containing random lacZ insertions throughout the genome have been constructed by transformation with a mutagenized genomic library. Twenty-eight hundred transformants containing fusion genes expressed during vegetative growth and 55 transformants containing meiotically induced fusion genes have been identified. Based on the frequency of transformed strains producing beta-galactosidase, we estimate that 80-86% of the yeast genome (excluding the rDNA) contains open reading frames expressed in vegetative cells and that there are 93-135 meiotically induced genes. Indirect immunofluorescence analysis of 2373 strains carrying fusion genes expressed in vegetative cells has identified 245 fusion proteins that localize to discrete locations in the cell, including the nucleus, mitochondria, endoplasmic reticulum, cytoplasmic dots, spindle pole body, and microtubules. The DNA sequence adjacent to the lacZ gene has been determined for 91 vegetative fusion genes whose products have been localized and for 43 meiotically induced fusions. Although most fusions represent genes unidentified previously, many correspond to known genes, including some whose expression has not been studied previously and whose products have not been localized. For example, Sec21-beta-gal fusion proteins yield a Golgi-like staining pattern, Ty1-beta-gal fusion proteins localize to cytoplasmic dots, and the meiosis-specific Mek1/Mre4-beta-gal and Spo11-beta-gal fusion proteins reside in the nucleus. The phenotypes in haploid cells have been analyzed for 59 strains containing chromosomal fusion genes expressed during vegetative growth; 9 strains fail to form colonies indicating that the disrupted genes are essential. Fifteen additional strains display slow growth or are impaired for growth on specific media or in the presence of inhibitors. Of 39 meiotically induced fusion genes examined, 14 disruptions confer defects in spore formation or spore viability in homozygous diploids. Our results will allow researchers who identify a yeast gene to determine immediately whether that gene is expressed at a specific time during the life cycle and whether its gene product localizes to a specific subcellular location.

    View details for Web of Science ID A1994NJ94700008

    View details for PubMedID 7926789

  • NHP6A AND NHP6B, WHICH ENCODE HMG1-LIKE PROTEINS, ARE CANDIDATES FOR DOWNSTREAM COMPONENTS OF THE YEAST SLT2 MITOGEN-ACTIVATED PROTEIN-KINASE PATHWAY MOLECULAR AND CELLULAR BIOLOGY Costigan, C., Kolodrubetz, D., Snyder, M. 1994; 14 (4): 2391-2403

    Abstract

    The yeast SLK1 (BCK1) gene encodes a mitogen-activated protein kinase (MAPK) activator protein which functions upstream in a protein kinase cascade that converges on the MAPK Slt2p (Mpk1p). Dominant alleles of SLK1 have been shown to bypass the conditional lethality of a protein kinase C mutation, pkc1-delta, suggesting that Pkc1p may regulate Slk1p function. Slk1p has an important role in morphogenesis and growth control, and deletions of the SLK1 gene are lethal in a spa2-delta mutant background. To search for genes that interact with the SLK1-SLT2 pathway, a synthetic lethal suppression screen was carried out. Genes which in multiple copies suppress the synthetic lethality of slk1-1 spa2-delta were identified, and one, the NHP6A gene, has been extensively characterized. The NHP6A gene and the closely related NHP6B gene were shown previously to encode HMG1-like chromatin-associated proteins. We demonstrate here that these genes are functionally redundant and that multiple copies of either NHP6A or NHP6B suppress slk1-delta and slt2-delta. Strains from which both NHP6 genes were deleted (nhp6-delta mutants) share many phenotypes with pkc1-delta, slk1-delta, and slt2-delta mutants. nhp6-delta cells display a temperature-sensitive growth defect that is rescued by the addition of 1 M sorbitol to the medium, and they are sensitive to starvation. nhp6-delta strains also exhibit a variety of morphological and cytoskeletal defects. At the restrictive temperature for growth, nhp6-delta mutant cells contain elongated buds and enlarged necks. Many cells have patches of chitin staining on their cell surfaces, and chitin deposition is enhanced at the necks of budded cells. nhp6-delta cells display a defect in actin polarity and often accumulate large actin chunks. Genetic and phenotypic analysis indicates that NHP6A and NHP6B function downstream of SLT2. Our results indicate that the Slt2p MAPK pathway in Saccharomyces cerevisiae may mediate its function in cell growth and morphogenesis, at least in part, through high-mobility group proteins.

    View details for Web of Science ID A1994NC05700018

    View details for PubMedID 8139543

  • MUTATIONS IN PRG1, A YEAST PROTEASOME-RELATED GENE, CAUSE DEFECTS IN NUCLEAR DIVISION AND ARE SUPPRESSED BY DELETION OF A MITOTIC CYCLIN GENE PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Friedman, H., Snyder, M. 1994; 91 (6): 2031-2035

    Abstract

    Proteasomes are ubiquitous complexes exhibiting proteolytic activity in vitro. The function(s) of these enzymes in vivo is not known. To investigate the in vivo role of proteasomes, four temperature-sensitive alleles of the Saccharomyces cerevisiae proteasome-related gene, PRG1, were constructed and analyzed. At both the permissive and restrictive temperatures, many prg1 cells have a large bud, contain replicated DNA, and have their nucleus positioned at the neck with a short spindle. These different phenotypes indicate a defect in nuclear division. Consistent with a nuclear division defect, prg1 mutant strains lose a dispensable chromosome at a higher frequency than wild-type cells. Importantly, deletion of CLB2, a gene encoding a mitotic cyclin, suppresses the temperature-sensitive growth phenotype of prg1 mutant strains. Our results indicate that proteasomes are important for nuclear division and suggest that they participate in degradation of the Clb2 protein (Clb2p).

    View details for Web of Science ID A1994NC04300014

    View details for PubMedID 8134345

  • LOCALIZATION OF THE KAR3 KINESIN HEAVY CHAIN-RELATED PROTEIN REQUIRES THE CIK1 INTERACTING PROTEIN JOURNAL OF CELL BIOLOGY Page, B. D., Satterwhite, L. L., Rose, M. D., Snyder, M. 1994; 124 (4): 507-519

    Abstract

    The Kar3 protein (Kar3p), a protein related to kinesin heavy chain, and the Cik1 protein (Cik1p) appear to participate in the same cellular processes in S. cerevisiae. Phenotypic analysis of mutants indicates that both CIK1 and KAR3 participate in spindle formation and karyogamy. In addition, the expression of both genes is induced by pheromone treatment. In vegetatively growing cells, both Cik1::beta-gal and Kar3::beta-gal fusions localize to the spindle pole body (SPB), and after pheromone treatment both fusion proteins localize to the spindle pole body and cytoplasmic microtubules. The dependence of Cik1p and Kar3p localization upon one another was investigated by indirect immunofluorescence of fusion proteins in pheromone-treated cells. The Cik1p::beta-gal fusion does not localize to the SPB or microtubules in a kar3 delta strain, and the Kar3p::beta-gal fusion protein does not localize to microtubule-associated structures in a cik1 delta strain. Thus, these proteins appear to be interdependent for localization to the SPB and microtubules. Analysis by both the two-hybrid system and co-immunoprecipitation experiments indicates that Cik1p and kar3p interact, suggesting that they are part of the same protein complex. These data indicate that interaction between a putative kinesin heavy chain-related protein and another protein can determine the localization of motor activity and thereby affect the functional specificity of the motor complex.

    View details for Web of Science ID A1994MW69100010

    View details for PubMedID 8106549

  • NUCLEAR DOT ANTIGENS MAY SPECIFY TRANSCRIPTIONAL DOMAINS IN THE NUCLEUS MOLECULAR AND CELLULAR BIOLOGY XIE, K. W., Lambie, E. J., Snyder, M. 1993; 13 (10): 6170-6179

    Abstract

    A bank of 892 human autoimmune serum samples was screened by indirect immunofluorescence on human tissue culture HT-29 cells. Seven serum samples that stain 4 to 10 bright dots in cell lines of several different mammals, including humans, monkeys, rats, and pigs, were identified. Immunofluorescence experiments indicate that these antigens, called nuclear dot (ND) antigens, are distinct from splicing complexes, kinetochores, and other known nuclear structures. An ND antigen recognized by these sera was cloned by immunoscreening a human lambda gt11 expression library. Analysis of seven cDNA clones for the ND antigen indicates that several mRNAs exist, perhaps derived through alternative splicing mechanisms. One major form of the message has an open reading frame of 1,440 bp capable of encoding a 53,000-M(r) protein. Treatment of cells with detergent, salt, or RNase A fails to remove the ND antigen from the nucleus. However, incubation with DNase I obliterates ND staining, indicating that the ND protein directly or indirectly associates with nuclear DNA. Fusion of the ND protein to a LexA DNA binding domain activates transcription in Saccharomyces cerevisiae. A 75-amino-acid domain that activates transcription in both yeast and primate cells has been identified. We suggest that ND antigens may participate in the activation of transcription of specific regions of the genome.

    View details for Web of Science ID A1993LY42400023

    View details for PubMedID 8413218

  • COMPONENTS REQUIRED FOR CYTOKINESIS ARE IMPORTANT FOR BUD SITE SELECTION IN YEAST JOURNAL OF CELL BIOLOGY FLESCHER, E. G., Madden, K., Snyder, M. 1993; 122 (2): 373-386

    Abstract

    Polarized cell division is a fundamental process that occurs in a variety of organisms; it is responsible for the proper positioning of daughter cells and the correct segregation of cytoplasmic components. The SPA2 gene of yeast encodes a nonessential protein that localizes to sites of cell growth and to the site of cytokinesis. spa2 mutants exhibit slightly altered budding patterns. In this report, a genetic screen was used to isolate a novel ochre allele of CDC10, cdc10-10; strains containing this mutation require the SPA2 gene for growth. CDC10 encodes a conserved potential GTP-binding protein that previously has been shown to localize to the bud neck and to be important for cytokinesis. The genetic interaction of cdc10-10 and spa2 suggests a role for SPA2 in cytokinesis. Most importantly, strains that contain a cdc10-10 mutation and those containing mutations affecting other putative neck filament proteins do not form buds at their normal proximal location. The finding that a component involved in cytokinesis is also important in bud site selection provides strong evidence for the cytokinesis tag model; i.e., critical components at the site of cytokinesis are involved in determining the next site of polarized growth and division.

    View details for Web of Science ID A1993LM58400008

    View details for PubMedID 8320260

  • CARBON SOURCE INDUCES GROWTH OF STATIONARY PHASE YEAST-CELLS, INDEPENDENT OF CARBON SOURCE METABOLISM YEAST Granot, D., Snyder, M. 1993; 9 (5): 465-479

    Abstract

    Nutrients regulate the proliferation of many eukaryotic cells: in the absence of sufficient nutrients vegetatively growing cells will enter stationary (G0 like) phase; in the presence of sufficient nutrients non-proliferative cells will begin growth. Previously we have shown that glucose is the critical nutrient which stimulates a variety of growth-related events in the yeast Saccharomyces cerevisiae (Granot and Snyder, 1991). This paper describes six new aspects of the induction of cell growth events by nutrients in S. cerevisiae. First, all carbon sources tested, both fermentable and non-fermentable, induce growth-related events in stationary phase cells, suggesting that the carbon source is the critical nutrient which stimulates growth. Second, the continuous presence of glucose is not necessary for the induction of growth events, but rather a short 'pulse' of glucose followed by an incubation period in water will induce growth events. Third, growth stimulation by glucose occurs in the absence of the SNF3 high affinity glucose transporter. Fourth, growth stimulation occurs independent of carbon source phosphorylation and carbon source metabolism. Fifth, growth induction by carbon source does not require protein synthesis or extracellular calcium. Sixth, following stimulation by carbon source, the cells remain induced for more than 2 h after removal of the carbon source. We suggest a general model in which different carbon sources act as signals to induce the earliest growth events during or following its entry into the cell and that these growth events do not depend upon metabolism of the carbon source.

    View details for Web of Science ID A1993LD24400002

    View details for PubMedID 8322510

  • NUCLEAR-PORE COMPLEX ANTIGENS DELINEATE NUCLEAR-ENVELOPE DYNAMICS IN VEGETATIVE AND CONJUGATING SACCHAROMYCES-CEREVISIAE YEAST Copeland, C. S., Snyder, M. 1993; 9 (3): 235-249

    Abstract

    In the yeast Saccharomyces cerevisiae, the nucleus undergoes dramatic shape changes during mitosis and mating. We have studied nuclear envelope dynamics during the processes of mitosis and conjugation using nuclear pore complexes as a marker for the nuclear envelope in wild-type cells and several cell-division-cycle (cdc) mutants. Three monoclonal antibodies are described that recognize nuclear pore complex-related antigens in S. cerevisiae. One of these antibodies, RL1, has been extensively characterized by Gerace and colleagues and recognizes nuclear pore complexes in mammalian and amphibian cells. By indirect immunofluorescence of yeast cells, all three antibodies yield a discontinuous nuclear rim stain. All three react with multiple nuclear-enriched proteins in immunoblots, including the nucleoporin protein encoded by the NSP1 gene. When the antibodies were used in immunofluorescence experiments on mating cells, the nuclear pore complex staining pattern proved to be a sensitive indicator of nuclear fusion. Nuclei with closely apposed spindle pole bodies and unfused nuclear envelopes could be readily distinguished. Marked shape changes were observed in nuclei during fusion and segregation of the diploid nucleus into the zygotic bud. In cdc14 and cdc15 mutants that arrest late in mitosis, the elongated nuclear envelope extension that stretches between daughter nuclei during telophase was preserved. In cytokinesis-defective mutants (cdc3, cdc10, cdc11 and cdc12), the elongated nuclear envelope was usually resolved into two daughter nuclei in the absence of cytokinesis. These results indicate that nuclear envelope division is mechanically distinguishable from chromosome segregation, nucleolar segregation and cytokinesis.

    View details for Web of Science ID A1993KV94000003

    View details for PubMedID 8488725

  • CHROMOSOME SEGREGATION IN YEAST ANNUAL REVIEW OF MICROBIOLOGY Page, B. D., Snyder, M. 1993; 47: 231-261

    Abstract

    Because of their genetic tractability, much has been learned concerning the mechanisms of chromosome segregation in budding yeast, Saccharomyces cerevisiae, and fission yeast, Schizosaccharomyces pombe. This chapter reviews the cytology and molecular and cell biology of mitosis in both of these yeasts. Current knowledge about the components of the mitotic spindle apparatus, including spindle pole bodies, centromeres, and microtubule components and motors, is summarized. Mechanisms of mitosis such as establishment and positioning of the mitotic spindle apparatus, anaphase A, and anaphase B are reviewed.

    View details for Web of Science ID A1993MA27600009

    View details for PubMedID 8257099

  • A HOMOLOG OF THE PROTEASOME-RELATED RING10 GENE IS ESSENTIAL FOR YEAST-CELL GROWTH GENE Friedman, H., Goebel, M., Snyder, M. 1992; 122 (1): 203-206

    Abstract

    Proteasomes are intracellular protein complexes displaying multiproteolytic activities. These complexes have been implicated in the antigen degradation process that generates peptides associated with the major histocompatibility complex (MHC) class-I molecule. RING10 and RING12 are genes encoded by the class-II region of the human MHC that have sequence homology to proteasome-encoding genes. We have identified a yeast gene, called PRG1, that encodes a protein predicted to contain 55.6% sequence identity to 80% of the RING10 gene product. Genomic disruption of PRG1 revealed that it is essential for yeast cell growth. These data strongly indicate that the antigen-processing system present in vertebrates evolved from a basic cellular process present in all organisms.

    View details for Web of Science ID A1992KB09800027

    View details for PubMedID 1452031

  • THE NUCLEAR-MITOTIC APPARATUS PROTEIN IS IMPORTANT IN THE ESTABLISHMENT AND MAINTENANCE OF THE BIPOLAR MITOTIC SPINDLE APPARATUS MOLECULAR BIOLOGY OF THE CELL Yang, C. H., Snyder, M. 1992; 3 (11): 1259-1267

    Abstract

    The formation and maintenance of the bipolar mitotic spindle apparatus require a complex and balanced interplay of several mechanisms, including the stabilization and separation of polar microtubules and the action of various microtubule motors. Nonmicrotubule elements are also present throughout the spindle apparatus and have been proposed to provide a structural support for the spindle. The Nuclear-Mitotic Apparatus protein (NuMA) is an abundant 240 kD protein that is present in the nucleus of interphase cells and concentrates in the polar regions of the spindle apparatus during mitosis. Sequence analysis indicates that NuMA possesses an unusually long alpha-helical central region characteristic of many filament forming proteins. In this report we demonstrate that microinjection of anti-NuMA antibodies into interphase and prophase cells results in a failure to form a mitotic spindle apparatus. Furthermore, injection of metaphase cells results in the collapse of the spindle apparatus into a monopolar microtubule array. These results identify for the first time a nontubulin component important for both the establishment and stabilization of the mitotic spindle apparatus in multicellular organisms. We suggest that nonmicrotubule structural components may be important for these processes.

    View details for Web of Science ID A1992JZ62000007

    View details for PubMedID 1457830

  • SPECIFICATION OF SITES FOR POLARIZED GROWTH IN SACCHAROMYCES-CEREVISIAE AND THE INFLUENCE OF EXTERNAL FACTORS ON SITE SELECTION MOLECULAR BIOLOGY OF THE CELL Madden, K., Snyder, M. 1992; 3 (9): 1025-1035

    Abstract

    Many eucaryotic cell types exhibit polarized cell growth and polarized cell division at nonrandom sites. The sites of polarized growth were investigated in G1 arrested haploid Saccharomyces cerevisiae cells. When yeast cells are arrested during G1 either by treatment with alpha-factor or by shifting temperature-sensitive cdc28-1 cells to the restrictive temperature, the cells form a projection. Staining with Calcofluor reveals that in both cases the projection usually forms at axial sites (i.e., next to the previous bud scar); these are the same sites where bud formation is expected to occur. These results indicate that sites of polarized growth are specified before the end of G1. Sites of polarized growth can be influenced by external conditions. Cells grown to stationary phase and diluted into fresh medium preferentially select sites for polarized growth opposite the previous bud scar (i.e., distal sites). Incubation of cells in a mating mixture results in projection formation at nonaxial sites: presumably cells form projections toward their mating partner. These observations have important implications in understanding three aspects of cell polarity in yeast: 1) how yeast cell shape is influenced by growth conditions 2) how sites of polarized growth are chosen, and 3) the pathway by which polarity is affected and redirected during the mating process.

    View details for Web of Science ID A1992JR05700009

    View details for PubMedID 1421575

  • CIK1 - A DEVELOPMENTALLY REGULATED SPINDLE POLE BODY-ASSOCIATED PROTEIN IMPORTANT FOR MICROTUBULE FUNCTIONS IN SACCHAROMYCES-CEREVISIAE GENES & DEVELOPMENT Page, B. D., Snyder, M. 1992; 6 (8): 1414-1429

    Abstract

    A genetic screen was devised to identify genes important for spindle pole body (SPB) and/or microtubule functions. Four mutants defective in both nuclear fusion (karyogamy) and chromosome maintenance were isolated; these mutants termed cik (for chromosome instability and karyogamy) define three complementation groups. The CIK1 gene was cloned and characterized. Sequence analysis of the CIK1 gene predicts that the CIK1 protein is 594 amino acids in length and possesses a central 300-amino-acid coiled-coil domain. Two different CIK1-beta-galactosidase fusions localize to the SPB region in vegetative cells, and antibodies against the authentic protein detect CIK1 in the SPB region of alpha-factor-treated cells. Evaluation of cells deleted for CIK1 (cik1-delta) indicates that CIK1 is important for the formation or maintenance of a spindle apparatus. Longer and slightly more microtubule bundles are visible in cik1-delta strains than in wild type. Thus, CIK1 encodes a SPB-associated component that is important for proper organization of microtubule arrays and the establishment of a spindle during vegetative growth. Furthermore, the CIK1 gene is essential for karyogamy, and the level of the CIK1 protein at the SPB appears to be dramatically induced by alpha-factor treatment. These results indicate that molecular changes occur at the microtubule-organizing center (MTOC) as the yeast cell prepares for karyogamy and imply that specialization of the MTOC or its associated microtubules occurs in preparation for particular microtubule functions in the yeast life cycle.

    View details for Web of Science ID A1992JH59900005

    View details for PubMedID 1644287

  • NUMA - AN UNUSUALLY LONG COILED-COIL RELATED PROTEIN IN THE MAMMALIAN NUCLEUS JOURNAL OF CELL BIOLOGY Yang, C. H., Lambie, E. J., Snyder, M. 1992; 116 (6): 1303-1317

    Abstract

    A bank of 892 autoimmune sera was screened by indirect immunofluorescence on mammalian cells. Six sera were identified that recognize an antigen(s) with a cell cycle-dependent localization pattern. In interphase cells, the antibodies stained the nucleus and in mitotic cells the spindle apparatus was recognized. Immunological criteria indicate that the antigen recognized by at least one of these sera corresponds to a previously identified protein called the nuclear mitotic apparatus protein (NuMA). A cDNA which partially encodes NuMA was cloned from a lambda gt11 human placental cDNA expression library, and overlapping cDNA clones that encode the entire gene were isolated. DNA sequence analysis of the clones has identified a long open reading frame capable of encoding a protein of 238 kD. Analysis of the predicted protein sequence suggests that NuMA contains an unusually large central alpha-helical domain of 1,485 amino acids flanked by nonhelical terminal domains. The central domain is similar to coiled-coil regions in structural proteins such as myosin heavy chains, cytokeratins, and nuclear lamins which are capable of forming filaments. Double immunofluorescence experiments performed with anti-NuMA and antilamin antibodies indicate that NuMA dissociates from condensing chromosomes during early prophase, before the complete disintegration of the nuclear lamina. As mitosis progresses, NuMA reassociates with telophase chromosomes very early during nuclear reformation, before substantial accumulation of lamins on chromosomal surfaces is evident. These results indicate that the NuMA proteins may be a structural component of the nucleus and may be involved in the early steps of nuclear reformation during telophase.

    View details for Web of Science ID A1992HH74900001

    View details for PubMedID 1541630

  • A SYNTHETIC LETHAL SCREEN IDENTIFIES SLK1, A NOVEL PROTEIN-KINASE HOMOLOG IMPLICATED IN YEAST-CELL MORPHOGENESIS AND CELL-GROWTH MOLECULAR AND CELLULAR BIOLOGY Costigan, C., GEHRUNG, S., Snyder, M. 1992; 12 (3): 1162-1178

    Abstract

    The Saccharomyces cerevisiae SPA2 protein localizes at sites involved in polarized cell growth in budding cells and mating cells. spa2 mutants have defects in projection formation during mating but are healthy during vegetative growth. A synthetic lethal screen was devised to identify mutants that require the SPA2 gene for vegetative growth. One mutant, called slk-1 (for synthetic lethal kinase), has been characterized extensively. The SLK1 gene has been cloned, and sequence analysis predicts that the SLK1 protein is 1,478 amino acid residues in length. Approximately 300 amino acids at the carboxy terminus exhibit sequence similarity with the catalytic domains of protein kinases. Disruption mutations have been constructed in the SLK1 gene. slk1 null mutants cannot grow at 37 degrees C, but many cells can grow at 30, 24, and 17 degrees C. Dead slk1 mutant cells usually have aberrant cell morphologies, and many cells are very small, approximately one-half the diameter of wild-type cells. Surviving slk1 cells also exhibit morphogenic defects; these cells are impaired in their ability to form projections upon exposure to mating pheromones. During vegetative growth, a higher fraction of slk1 cells are unbudded compared with wild-type cells, and under nutrient limiting conditions, slk1 cells exhibit defects in cell cycle arrest. The different slk1 mutant defects are partially rescued by an extra copy of the SSD1/SRK1 gene. SSD1/SRK1 has been independently isolated as a suppressor of mutations in genes involved in growth control, sit4, pde2, bcy1, and ins1 (A. Sutton, D. Immanuel, and K.T. Arnat, Mol. Cell. Biol. 11:2133-2148, 1991; R.B. Wilson, A.A. Brenner, T.B. White, M.J. Engler, J.P. Gaughran, and K. Tatchell, Mol. Cell. Biol. 11:3369-3373, 1991). These data suggest that SLK1 plays a role in both cell morphogenesis and the control of cell growth. We speculate that SLK1 may be a regulatory link for these two cellular processes.

    View details for Web of Science ID A1992HE83800026

    View details for PubMedID 1545797

  • THE NUF1 GENE ENCODES AN ESSENTIAL COILED-COIL RELATED PROTEIN THAT IS A POTENTIAL COMPONENT OF THE YEAST NUCLEOSKELETON JOURNAL OF CELL BIOLOGY Mirzayan, C., Copeland, C. S., Snyder, M. 1992; 116 (6): 1319-1332

    Abstract

    In an attempt to identify structural components of the yeast nucleus, subcellular fractions of yeast nuclei were prepared and used as immunogens to generate complex polyclonal antibodies. One such serum was used to screen a yeast genomic lambda gt11 expression library. A clone encoding a gene called NUF1 (for nuclear filament-related) was identified and extensively characterized. Antibodies to NUF1 fusion proteins were generated, and affinity-purified antibodies were used for immunoblot analysis and indirect immunofluorescence localization. The NUF1 protein is 110 kD in molecular mass and localizes to the yeast nucleus in small granular patches. Intranuclear staining is present in cells at all stages of the cell cycle. The NUF1 protein of yeast is tightly associated with the nucleus; it was not removed by extraction of nuclei with nonionic detergent or salt, or treatment with RNAse and DNAse. Sequence analysis of the NUF1 gene predicts a protein 945 amino acids in length that contains three domains: a large 627 residue central domain predicted to form a coiled-coil structure flanked by nonhelical amino-terminal and carboxy-terminal regions. Disruption of the NUF1 gene indicates that it is necessary for yeast cell growth. These results indicate that NUF1 encodes an essential coiled-coil protein within the yeast nucleus; we speculate that NUF1 is a component of the yeast nucleoskeleton. In addition, immunofluorescence results indicate that mammalian cells contain a NUF1-related nuclear protein. These data in conjunction with those in the accompanying manuscript (Yang et al., 1992) lead to the hypothesis that an internal coiled-coil filamentous system may be a general structural component of the eukaryotic nucleus.

    View details for Web of Science ID A1992HH74900002

    View details for PubMedID 1541631

  • Cell polarity and morphogenesis in Saccharomyces cerevisiae. Trends in cell biology Madden, K., Costigan, C., Snyder, M. 1992; 2 (1): 22-29

    Abstract

    Polarized cell growth and division are fundamental to cellular differentiation and tissue formation in eukaryotes. Analysis of cell polarity in the budding yeast Saccharomyces cerevisiae has allowed the identification of many regulatory, secretory and cytoskeletal components involved in these processes, as well as the elucidation of various steps in these events. Many of these components and processes may be similar in other eukaryotes.

    View details for PubMedID 14731634

  • THE KNS1 GENE OF SACCHAROMYCES-CEREVISIAE ENCODES A NONESSENTIAL PROTEIN-KINASE HOMOLOG THAT IS DISTANTLY RELATED TO MEMBERS OF THE CDC28/CDC2 GENE FAMILY MOLECULAR & GENERAL GENETICS Padmanabha, R., GEHRUNG, S., Snyder, M. 1991; 229 (1): 1-9

    Abstract

    A novel protein kinase homologue (KNS1) has been identified in Saccharomyces cerevisiae. KNS1 contains an open reading frame of 720 codons. The carboxy-terminal portion of the predicted protein sequence is similar to that of many other protein kinases, exhibiting 36% identity to the cdc2 gene product of Schizosaccharomyces pombe and 34% identity to the CDC28 gene product of S. cerevisiae. Deletion mutations were constructed in the KNS1 gene. kns1 mutants grow at the same rate as wild-type cells using several different carbon sources. They mate at normal efficiencies, and they sporulate successfully. No defects were found in entry into or exit from stationary phase. Thus, the KNS1 gene is not essential for cell growth and a variety of other cellular processes in yeast.

    View details for Web of Science ID A1991GF17600001

    View details for PubMedID 1910150

  • STUDIES CONCERNING THE TEMPORAL AND GENETIC-CONTROL OF CELL POLARITY IN SACCHAROMYCES-CEREVISIAE JOURNAL OF CELL BIOLOGY Snyder, M., GEHRUNG, S., Page, B. D. 1991; 114 (3): 515-532

    Abstract

    The establishment of cell polarity was examined in the budding yeast, S. cerevisiae. The distribution of a polarized protein, the SPA2 protein, was followed throughout the yeast cell cycle using synchronized cells and cdc mutants. The SPA2 protein localizes to a patch at the presumptive bud site of G1 cells. Later it concentrates at the bud tip in budded cells. At cytokinesis, the SPA2 protein is at the neck between the mother and daughter cells. Analysis of unbudded haploid cells has suggested a series of events that occurs during G1. The SPA2 patch is established very early in G1, while the spindle pole body residues on the distal side of the nucleus. Later, microtubules emanating from the spindle pole body intersect the SPA2 crescent, and the nucleus probably rotates towards the SPA2 patch. By middle G1, most cells contain the SPB on the side of the nucleus proximal to the SPA2 patch, and a long extranuclear microtubule bundle intersects this patch. We suggest that a microtubule capture site exists in the SPA2 staining region that stabilizes the long microtubule bundle; this capture site may be responsible for rotation of the nucleus. Cells containing a polarized distribution of the SPA2 protein also possess a polarized distribution of actin spots in the same region, although the actin staining is much more diffuse. Moreover, cdc4 mutants, which form multiple buds at the restrictive temperature, exhibit simultaneous staining of the SPA2 protein and actin spots in a subset of the bud tips. spa2 mutants contain a polarized distribution of actin spots, and act1-1 and act1-2 mutants often contain a polarized distribution of the SPA2 protein suggesting that the SPA2 protein is not required for localization of the actin spots and the actin spots are not required for localization of the SPA2 protein. cdc24 mutants, which fail to form buds at the restrictive temperature, fail to exhibit polarized localization of the SPA2 protein and actin spots, indicating that the CDC24 protein is directly or indirectly responsible for controlling the polarity of these proteins. Based on the cell cycle distribution of the SPA2 protein, a "cytokinesis tag" model is proposed to explain the mechanism of the non-random positioning of bud sites in haploid yeast cells.

    View details for Web of Science ID A1991FY14900012

    View details for PubMedID 1860883

  • GLUCOSE INDUCES CAMP-INDEPENDENT GROWTH-RELATED CHANGES IN STATIONARY-PHASE CELLS OF SACCHAROMYCES-CEREVISIAE PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Granot, D., Snyder, M. 1991; 88 (13): 5724-5728

    Abstract

    Nutrients play a critical role in the decision to initiate a new cell cycle. Addition of nutrients to arrested cells such as stationary-phase cells and spores induces them to begin growth. We have analyzed the nutrients required to induce early cellular events in yeast. When stationary-phase cells or spores are incubated in the presence of only glucose, morphological and physiological changes characteristic of mitotically growing cells are induced and, in the absence of additional nutrients to support growth, the cells rapidly lose viability. Preincubation of stationary-phase cells in the presence of glucose decreases the time required to reach bud emergence upon the subsequent addition of rich medium. These processes are specifically induced by D-glucose and not by other components such as nitrogen source or L-glucose. The glucose-induced events are independent of the adenylate cyclase pathway, since strains with a temperature-sensitive mutation in either the adenylate cyclase gene (CDC35) or its regulator (CDC25) undergo glucose-induced cellular changes when incubated at the restrictive temperature. We suggest that glucose triggers events in the induction of a new mitotic cell cycle and that these events are either prior to the adenylate cyclase pathway or are in an alternative pathway.

    View details for Web of Science ID A1991FU90100051

    View details for PubMedID 1648229

  • SEGREGATION OF THE NUCLEOLUS DURING MITOSIS IN BUDDING AND FISSION YEAST CELL MOTILITY AND THE CYTOSKELETON Granot, D., Snyder, M. 1991; 20 (1): 47-54

    Abstract

    The segregation of the nucleolus during mitosis was examined in Saccharomyces cerevisiae and Schizosaccharomyces pombe by indirect immunofluorescence using antibodies directed to highly conserved anti-nucleolus antigens. In mitotic S. pombe cells, the nucleolus appears to trail the bulk of the DNA. In wild-type cells of S. cerevisiae, the nucleolus segregates alongside the bulk of the genomic DNA. Based on its distance from the centromere, we would expect the rDNA in both organisms to segregate behind the majority of the genomic DNA, if telomeric regions trail centromeric regions as in other eukaryotes. We therefore suggest that in S. cerevisiae the nucleolus is attached to other parts of the nucleus which enable it to segregate along with the bulk of the DNA. The segregation of the nucleolus in topoisomerase mutants and nuclear division mutants of S. cerevisiae was also investigated. In cdc14 mutants which arrest at late anaphase, the vast majority of the DNA is separated, but the nucleolar antigens remain extended between the mother and daughter cells. Thus, the CDC14 gene of S. cerevisiae appears to be important for the separation of the nucleolus at mitosis.

    View details for Web of Science ID A1991GD91400005

    View details for PubMedID 1661641

  • THE SPA2 GENE OF SACCHAROMYCES-CEREVISIAE IS IMPORTANT FOR PHEROMONE-INDUCED MORPHOGENESIS AND EFFICIENT MATING JOURNAL OF CELL BIOLOGY GEHRUNG, S., Snyder, M. 1990; 111 (4): 1451-1464

    Abstract

    Upon exposure to mating pheromone, Saccharomyces cerevisiae undergoes cellular differentiation to form a morphologically distinct cell called a "shmoo". Double staining experiments revealed that both the SPA2 protein and actin localize to the shmoo tip which is the site of polarized cell growth. Actin concentrates as spots throughout the shmoo projection, while SPA2 localizes as a sharp patch at the shmoo tip. DNA sequence analysis of the SPA2 gene revealed an open reading frame 1,466 codons in length; the predicted protein sequence contains many internal repeats including a nine amino acid sequence that is imperfectly repeated 25 times. Portions of the SPA2 sequence exhibit a low-level similarity to proteins containing coiled-coil structures. Yeast cells containing a large deletion of the SPA2 gene are similar in growth rate to wild-type cells. However, spa2 mutant cells are impaired in their ability to form shmoos upon exposure to mating pheromone, and they do not mate efficiently with other spa2 mutant cells. Thus, we suggest that the SPA2 protein plays a critical role in cellular morphogenesis during mating, perhaps as a cytoskeletal protein.

    View details for Web of Science ID A1990EA35400012

    View details for PubMedID 2211820

  • HIGHER-ORDER STRUCTURE IS PRESENT IN THE YEAST NUCLEUS - AUTOANTIBODY PROBES DEMONSTRATE THAT THE NUCLEOLUS LIES OPPOSITE THE SPINDLE POLE BODY CHROMOSOMA Yang, C. H., Lambie, E. J., Hardin, J., Craft, J., Snyder, M. 1989; 98 (2): 123-128

    Abstract

    A panel of sera from 892 autoimmune patients was screened by indirect immunofluorescence on mammalian cells. Seventy-three sera were identified that recognize the nucleolus. Three of these sera appear to stain the nucleolus in yeast, suggesting that they recognize highly conserved antigens. These three sera also immunoprecipitate mammalian U3 snRNA-containing particles, which reside in the nucleolus and have been implicated in rRNA processing. Double immunofluorescence experiments with anti-nucleolus and anti-tubulin antibodies revealed a novel form of non-random nuclear organization in yeast. The spindle pole body and the nucleolus-both of which are associated with the nuclear envelope-preferentially localize at opposite ends of the nucleus. Organization of these and other components into specific regions of the nucleus may be important for optimizing their proper function.

    View details for Web of Science ID A1989AK25300008

    View details for PubMedID 2673672

  • THE SPA2 PROTEIN OF YEAST LOCALIZES TO SITES OF CELL-GROWTH JOURNAL OF CELL BIOLOGY Snyder, M. 1989; 108 (4): 1419-1429

    Abstract

    A yeast gene, SPA2, was isolated with human anti-spindle pole autoantibodies. The SPA2 gene was fused to the Escherichia coli trpE gene, and polyclonal antibodies were prepared to the fusion protein. Immunofluorescence experiments indicate that the SPA2 gene product has a sharply polarized distribution in yeast cells. In budded cells the SPA2 protein is present at the tip of the bud; in unbudded cells, it is localized to one edge of the cell. When a-cells are induced to form schmoos with alpha-factor, the SPA2 protein is found at the tip of the schmoo. These areas of SPA2 localization correspond to cellular sites expected to be involved in bud formation and/or cell growth. The SPA2 antigen is present in a-cells, alpha-cells, and a/alpha-diploid cells, but is absent in mutant cells in which the SPA2 gene has been disrupted. spa2 mutant cells are viable, but display defects in the direction and control of cell growth. Compared to wild-type cells, spa2 mutant cells have slightly altered budding patterns. Entry into stationary phase is impaired for spa2 mutants, and mutants with one particular allele, spa2-7, form multiple buds under nutrient-limiting conditions. Thus, SPA2 is a newly identified yeast gene that is involved in the direction and control of cell division, and whose gene product localizes to the site of cell growth.

    View details for Web of Science ID A1989T953300022

    View details for PubMedID 2647769

  • GENOMIC ORGANIZATION OF TRANSFER-RNA AND AMINOACYL-TRNA SYNTHETASE GENES FOR 2 AMINO-ACIDS IN SACCHAROMYCES-CEREVISIAE GENOMICS Kolman, C. J., Snyder, M., Soll, D. 1988; 3 (3): 201-206

    Abstract

    The genomic organization in Saccharomyces cerevisiae of the tRNA and aminoacyl-tRNA synthetase genes for two amino acids was investigated. Aspartic acid and serine were chosen for the study because of the number and diversity of their tRNA gene sequences and the availability of cloned tRNA and aminoacyl-tRNA synthetase genes. Chromosome assignments were determined by hybridization to DNA gel blots of chromosomal DNA resolved by contour-clamped homogeneous electric field gel electrophoresis. Our results show that the tRNA and the cognate synthetase genes in such a family are dispersed and, therefore, cannot be regulated via a mechanism dependent on close proximity of genes. In general, the genome of S. cerevisiae contains randomly dispersed tRNA genes that are transcribed individually. We have supported and expanded this view by applying the facile method of contour-clamped homogeneous electric field gel electrophoresis to the investigation of these small multigene families.

    View details for Web of Science ID A1988R066400004

    View details for PubMedID 3066745

  • DNA GYRASE ON THE BACTERIAL CHROMOSOME - DNA CLEAVAGE INDUCED BY OXOLINIC ACID JOURNAL OF MOLECULAR BIOLOGY Snyder, M., Drlica, K. 1979; 131 (2): 287-302

    View details for Web of Science ID A1979HE03000008

    View details for PubMedID 226717

  • SUPERHELICAL ESCHERICHIA-COLI DNA - RELAXATION BY COUMERMYCIN JOURNAL OF MOLECULAR BIOLOGY Drlica, K., Snyder, M. 1978; 120 (2): 145-154

    View details for Web of Science ID A1978EX21600001

    View details for PubMedID 347091