Stephen B. Montgomery
Stanford Medicine Professor of Pathology, Professor of Genetics and of Biomedical Data Science
Bio
Stephen Montgomery is an Endowed Professor of Pathology, Genetics, Biomedical Data Science and, by courtesy, Computer Science at Stanford University. He has trained in multiple countries including Canada, Germany, England, and Switzerland. He is best known for his work mapping the effects of genetic variation to gene expression and authored the first publications that compared whole genomes and transcriptome data within a human population and pioneered the use of molecular outliers to identify impactful rare variants (Montgomery et al, 2010, Montgomery et al, 2011).
Montgomery and his lab lead major genomics initiatives to understand the molecular mechanisms that underlie disease-associated variation. In 2017, they published analyses from the Genotype-Tissue-Expression (GTEx) Consortium which analyzed the impact of genetic variation on gene expression across tissues of the human body (GTEx Consortium, 2017). In 2024, his lab led major analyses in the NIH Common Fund MoTrPAC study identifying the molecular effects of exercise training across rat tissues (MoTrPAC, 2024). He is a Principal Investigator within multiple major NIH consortia including the GREGoR, MoTrPAC, TOPMed and Functional ADSP consortia and an Investigator in the Developmental GTEx, IGVF, SMaHT, AllOfUs, Undiagnosed Disease Network and ENCODE4 consortia demonstrating his labs ongoing impacts in multiple major genomics projects.
The Montgomery lab has a specific focus on mapping the molecular effects of rare and environment-responsive genetic variants. Work in his laboratory focuses on developing approaches for studying rare variants (such as Li et al, 2017; Ferraro et al, 2020) and seeing these approaches applied to understanding novel disease biology and providing diagnoses of individuals with genetic diseases (Fresard et al., 2017). As a PI of the GREGoR Stanford Site, his lab develops and applies these strategies to diagnose individuals with undiagnosed, rare diseases. The GREGoR Stanford site is currently recruiting 500 families with unsolved diagnoses in California to apply novel multi-omics and computational strategies to acheive diagnoses. His laboratory further has a specific focus on understanding the molecular consequences of structural variants and chromosomal copy number changes (Marderstein et al, 2024).
The Montgomery lab is also focused on advancing our understanding of common genetic variants and understudied RNAs. Examples of this work, his lab has demonstrated that multiple genetic variants contribute to genetic disease associations (Abell et al, 2022) and his lab has developed approaches to identify impactful long non-coding RNAs that contribute to complex disease (de Goede et al, 2021). Ongoing effort in his lab has focused on neurodegenerative and neurodevelopmental traits.
Montgomery is an active member of both the Stanford and broader research community. Among his contributions, he serves as a co-director of an NHGRI PhD T32 Training Grant, member of the Pathology DEI committee, Faculty Director of Graduate Admissions for the Biomedical Data Science program and served for 4 years as a Stanford University Faculty Senator. He has/or is currently on the programming committee for major conferences such as ASHG, AGBT and WTSI Genomics of Rare Diseases. He is the incoming chair for the ASHG Awards committee. He is also a standing member of the NIH GHD Study Section.
In 2019, Montgomery was awarded the annual American Society of Human Genetics Early Career Award for his multi-faceted impacts on human genetics and genomics. In 2023, he was awarded the annual Stanford Prize in Population Genetics and Society. In 2024, he was awarded the Stanford Pathology Research Mentor Award.
Academic Appointments
-
Professor, Pathology
-
Professor, Genetics
-
Professor, Department of Biomedical Data Science
-
Member, Bio-X
-
Member, Cardiovascular Institute
Administrative Appointments
-
Director of Genome Informatics, Department of Pathology (2011 - Present)
Professional Education
-
B.A.Sc., University of British Columbia, Engineering Physics (2002)
-
Ph.D., University of British Columbia, Genetics (2006)
Current Research and Scholarly Interests
We focus on understanding the effects of genome variation on cellular phenotypes and cellular modeling of disease through genomic approaches such as next generation RNA sequencing in combination with developing and utilizing state-of-the-art bioinformatics and statistical genetics approaches. See our website at http://montgomerylab.stanford.edu/
2024-25 Courses
- Informatics in Industry
BIOMEDIN 206 (Spr) -
Independent Studies (21)
- Advanced Reading and Research
CS 499 (Aut, Win, Spr, Sum) - Advanced Reading and Research
CS 499P (Aut, Win, Spr, Sum) - Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390A (Aut, Win, Spr, Sum) - Directed Investigation
BIOE 392 (Aut, Win, Spr, Sum) - Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum) - Directed Reading in Genetics
GENE 299 (Aut, Win, Spr, Sum) - Directed Reading in Pathology
PATH 299 (Aut, Win, Spr, Sum) - Directed Study
BIOE 391 (Aut, Win, Spr, Sum) - Early Clinical Experience in Pathology
PATH 280 (Aut, Win, Spr, Sum) - Graduate Research
GENE 399 (Aut, Win, Spr, Sum) - Graduate Research
IMMUNOL 399 (Aut, Win, Spr, Sum) - Graduate Research
PATH 399 (Aut, Win, Spr, Sum) - Independent Project
CS 399 (Aut, Win, Spr, Sum) - Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum) - Medical Scholars Research
GENE 370 (Aut, Win, Spr, Sum) - Medical Scholars Research
PATH 370 (Aut, Win, Spr, Sum) - Supervised Study
GENE 260 (Aut, Win, Spr, Sum) - Undergraduate Research
GENE 199 (Aut, Win, Spr, Sum) - Undergraduate Research
PATH 199 (Aut, Win, Spr, Sum) - Writing Intensive Senior Research Project
CS 191W (Aut, Win, Spr)
- Advanced Reading and Research
-
Prior Year Courses
2023-24 Courses
- Informatics in Industry
BIOMEDIN 206 (Spr)
2022-23 Courses
- Informatics in Industry
BIOMEDIN 206 (Spr)
- Informatics in Industry
Stanford Advisees
-
Doctoral Dissertation Reader (AC)
Jon Bezney, Michael Hayes, Jodie Lunger, Taylor Pursell, Alp Tartici -
Postdoctoral Faculty Sponsor
Iman Jaljuli, Nick Lashinsky, Evin Padhi, Yilin Xie -
Doctoral Dissertation Advisor (AC)
Sohaib Hassan, Tanner Jensen, Julie Lake, Kate Lawrence, Maggie Maurer, Sherry Yang -
Master's Program Advisor
Zachary Cadiz, Daniel Guo, Robert Igbokwe, Caroline Van, Ananya Vasireddy -
Doctoral Dissertation Co-Advisor (AC)
Kameron Rodrigues -
Doctoral (Program)
Ben Ehlert, Samson Mataraso, Esther Robb, Min Sun, Christine Yiwen Yeh
All Publications
-
regionalpcs improve discovery of DNA methylation associations with complex traits.
Nature communications
2025; 16 (1): 368
Abstract
We have developed the regionalpcs method, an approach for summarizing gene-level methylation. regionalpcs addresses the challenge of deciphering complex epigenetic mechanisms in diseases like Alzheimer's disease. In contrast to averaging, regionalpcs uses principal components analysis to capture complex methylation patterns across gene regions. Our method demonstrates a 54% improvement in sensitivity over averaging in simulations, providing a robust framework for identifying subtle epigenetic variations. Applying regionalpcs to Alzheimer's disease brain methylation data, combined with cell type deconvolution, we uncover 838 differentially methylated genes associated with neuritic plaque burden-significantly outperforming conventional methods. Integrating methylation quantitative trait loci with genome-wide association studies identified 17 genes with potential causal roles in Alzheimer's disease risk, including MS4A4A and PICALM. Available in the Bioconductor package regionalpcs, our approach facilitates a deeper understanding of the epigenetic landscape in Alzheimer's disease and opens avenues for research into complex diseases.
View details for DOI 10.1038/s41467-024-55698-6
View details for PubMedID 39753567
View details for PubMedCentralID PMC11698866
-
High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.
Genome research
2024
Abstract
Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
View details for DOI 10.1101/gr.279273.124
View details for PubMedID 39358015
-
Leaving no patient behind! Expert recommendation in the use of innovative technologies for diagnosing rare diseases.
Orphanet journal of rare diseases
2024; 19 (1): 357
Abstract
Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed. In response, IRDiRC proposes the establishment of "a globally coordinated diagnostic and research pipeline". To help facilitate this, IRDiRC formed the Task Force on Integrating New Technologies for Rare Disease Diagnosis. This multi-stakeholder Task Force aims to provide an overview of the current state of innovative diagnostic technologies for clinicians and researchers, focusing on the patient's diagnostic journey. Herein, we provide an overview of a broad spectrum of emerging diagnostic technologies involving genomics, epigenomics and multi-omics, functional testing and model systems, data sharing, bioinformatics, and Artificial Intelligence (AI), highlighting their advantages, limitations, and the current state of clinical adaption. We provide expert recommendations outlining the stepwise application of these innovative technologies in the diagnostic pathways while considering global differences in accessibility. The importance of FAIR (Findability, Accessibility, Interoperability, and Reusability) and CARE (Collective benefit, Authority to control, Responsibility, and Ethics) data management is emphasized, along with the need for enhanced and continuing education in medical genomics. We provide a perspective on future technological developments in genome diagnostics and their integration into clinical practice. Lastly, we summarize the challenges related to genomic diversity and accessibility, highlighting the significance of innovative diagnostic technologies, global collaboration, and equitable access to diagnosis and treatment for people living with rare disease.
View details for DOI 10.1186/s13023-024-03361-0
View details for PubMedID 39334316
View details for PubMedCentralID PMC11438178
-
Single-cell multi-omics map of human fetal blood in Down syndrome.
Nature
2024
Abstract
Down syndrome predisposes individuals to haematological abnormalities, such as increased number of erythrocytes and leukaemia in a process that is initiated before birth and is not entirely understood1-3. Here, to understand dysregulated haematopoiesis in Down syndrome, we integrated single-cell transcriptomics of over 1.1 million cells with chromatin accessibility and spatial transcriptomics datasets using human fetal liver and bone marrow samples from 3 fetuses with disomy and 15 fetuses with trisomy. We found that differences in gene expression in Down syndrome were dependent on both cell type and environment. Furthermore, we found multiple lines of evidence that haematopoietic stem cells (HSCs) in Down syndrome are 'primed' to differentiate. We subsequently established a Down syndrome-specific map linking non-coding elements to genes in disomic and trisomic HSCs using 10X multiome data. By integrating this map with genetic variants associated with blood cell counts, we discovered that trisomy restructured regulatory interactions to dysregulate enhancer activity and gene expression critical to erythroid lineage differentiation. Furthermore, as mutations in Down syndrome display a signature of oxidative stress4,5, we validated both increased mitochondrial mass and oxidative stress in Down syndrome, and observed that these mutations preferentially fell into regulatory regions of expressed genes in HSCs. Together, our single-cell, multi-omic resource provides a high-resolution molecular map of fetal haematopoiesis in Down syndrome and indicates significant regulatory restructuring giving rise to co-occurring haematological conditions.
View details for DOI 10.1038/s41586-024-07946-4
View details for PubMedID 39322663
View details for PubMedCentralID 2480572
-
A lymphocyte chemoaffinity axis for lung, non-intestinal mucosae and CNS.
Nature
2024
Abstract
Tissue-selective chemoattractants direct lymphocytes to epithelial surfaces to establish local immune environments, regulate immune responses to food antigens and commensal organisms, and protect from pathogens. Homeostatic chemoattractants for small intestines, colon, and skin are known1 2, but chemotropic mechanisms selective for respiratory tract and other non-intestinal mucosal tissues (NIMT) remain poorly understood. Here we leveraged diverse omics datasets to identify GPR25 as a lymphocyte receptor for CXCL17, a chemoattractant cytokine whose expression by epithelial cells of airways, upper gastrointestinal and squamous mucosae unifies the NIMT and distinguishes them from intestinal mucosae. Single-cell transcriptomic analyses show that GPR25 is induced on innate lymphocytes prior to emigration to the periphery, and is imprinted in secondary lymphoid tissues on activated B and T cells responding to immune challenge. GPR25 characterizes B and T tissue resident memory and regulatory T lymphocytes in NIMT and lungs in humans and mediates lymphocyte homing to barrier epithelia of the airways, oral cavity, stomach, biliary and genitourinary tracts in mouse models. GPR25 is also expressed by T cells in cerebrospinal fluid and CXCL17 by neurons, suggesting a role in CNS immune regulation. We reveal widespread imprinting of GPR25 on regulatory T cells, suggesting a mechanistic link to population genetic evidence that GPR25 is protective in autoimmunity3,4. Our results define a GPR25-CXCL17 chemoaffinity axis with the potential to integrate immunity and tolerance at non-intestinal mucosae and the CNS.
View details for DOI 10.1038/s41586-024-08043-2
View details for PubMedID 39293486
-
SINGLE-CELL MULTI-OMICS MAP OF HUMAN FOETAL BLOOD IN DOWN'S SYNDROME
ELSEVIER SCIENCE INC. 2024
View details for Web of Science ID 001343414100083
-
SINGLE-CELL MULTI-OMICS MAP OF HUMAN FOETAL BLOOD IN DOWN'S SYNDROME
ELSEVIER SCIENCE INC. 2024
View details for Web of Science ID 001325038400023
-
De novo variants in the RNU4-2 snRNA cause a frequent neurodevelopmental syndrome.
Nature
2024
Abstract
Around 60% of individuals with neurodevelopmental disorders (NDD) remain undiagnosed after comprehensive genetic testing, primarily of protein-coding genes1. Large genome-sequenced cohorts are improving our ability to discover new diagnoses in the non-coding genome. Here, we identify the non-coding RNA RNU4-2 as a syndromic NDD gene. RNU4-2 encodes the U4 small nuclear RNA (snRNA), which is a critical component of the U4/U6.U5 tri-snRNP complex of the major spliceosome2. We identify an 18 bp region of RNU4-2 mapping to two structural elements in the U4/U6 snRNA duplex (the T-loop and Stem III) that is severely depleted of variation in the general population, but in which we identify heterozygous variants in 115 individuals with NDD. Most individuals (77.4%) have the same highly recurrent single base insertion (n.64_65insT). In 54 individuals where it could be determined, the de novo variants were all on the maternal allele. We demonstrate that RNU4-2 is highly expressed in the developing human brain, in contrast to RNU4-1 and other U4 homologs. Using RNA-sequencing, we show how 5' splice site usage is systematically disrupted in individuals with RNU4-2 variants, consistent with the known role of this region during spliceosome activation. Finally, we estimate that variants in this 18 bp region explain 0.4% of individuals with NDD. This work underscores the importance of non-coding genes in rare disorders and will provide a diagnosis to thousands of individuals with NDD worldwide.
View details for DOI 10.1038/s41586-024-07773-7
View details for PubMedID 38991538
-
Impact of genome build on RNA-seq interpretation and diagnostics.
American journal of human genetics
2024
Abstract
Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network and Genomics Research to Elucidate the Genetics of Rare Disease Consortium. Across six routinely collected biospecimens, 61% of quantified genes were not influenced by genome build. However, we identified 1,492 genes with build-dependent quantification, 3,377 genes with build-exclusive expression, and 9,077 genes with annotation-specific expression across six routinely collected biospecimens, including 566 clinically relevant and 512 known OMIM genes. Further, we demonstrate that between builds for a given gene, a larger difference in quantification is well correlated with a larger change in expression outlier calling. Combined, we provide a database of genes impacted by build choice and recommend that transcriptomics-guided analyses and diagnoses are cross referenced with these data for robustness.
View details for DOI 10.1016/j.ajhg.2024.05.005
View details for PubMedID 38834072
-
Loss of function of FAM177A1, a Golgi complex localized protein, causes a novel neurodevelopmental disorder.
Genetics in medicine : official journal of the American College of Medical Genetics
2024: 101166
Abstract
The function of FAM177A1 and its relationship to human disease is largely unknown. Recent studies have demonstrated FAM177A1 to be a critical immune-associated gene. One previous case study has linked FAM177A1 to a neurodevelopmental disorder in four siblings.We identified five individuals from three unrelated families with biallelic variants in FAM177A1. The physiological function of FAM177A1 was studied in a zebrafish model organism and human cell lines with loss-of-function variants similar to the affected cohort.These individuals share a characteristic phenotype defined by macrocephaly, global developmental delay, intellectual disability, seizures, behavioral abnormalities, hypotonia, and gait disturbance. We show that FAM177A1 localizes to the Golgi complex in mammalian and zebrafish cells. Intersection of the RNA-seq and metabolomic datasets from FAM177A1-deficient human fibroblasts and whole zebrafish larvae demonstrated dysregulation of pathways associated with apoptosis, inflammation, and negative regulation of cell proliferation.Our data sheds light on the emerging function of FAM177A1 and defines FAM177A1-related neurodevelopmental disorder as a new clinical entity.
View details for DOI 10.1016/j.gim.2024.101166
View details for PubMedID 38767059
-
The impact of exercise on gene regulation in association with complex trait genetics.
Nature communications
2024; 15 (1): 3346
Abstract
Endurance exercise training is known to reduce risk for a range of complex diseases. However, the molecular basis of this effect has been challenging to study and largely restricted to analyses of either few or easily biopsied tissues. Extensive transcriptome data collected across 15 tissues during exercise training in rats as part of the Molecular Transducers of Physical Activity Consortium has provided a unique opportunity to clarify how exercise can affect tissue-specific gene expression and further suggest how exercise adaptation may impact complex disease-associated genes. To build this map, we integrate this multi-tissue atlas of gene expression changes with gene-disease targets, genetic regulation of expression, and trait relationship data in humans. Consensus from multiple approaches prioritizes specific tissues and genes where endurance exercise impacts disease-relevant gene expression. Specifically, we identify a total of 5523 trait-tissue-gene triplets to serve as a valuable starting point for future investigations [Exercise; Transcription; Human Phenotypic Variation].
View details for DOI 10.1038/s41467-024-45966-w
View details for PubMedID 38693125
-
Temporal dynamics of the multi-omic response to endurance exercise training.
Nature
2024; 629 (8010): 174-183
Abstract
Regular exercise promotes whole-body health and prevents disease, but the underlying molecular mechanisms are incompletely understood1-3. Here, the Molecular Transducers of Physical Activity Consortium4 profiled the temporal transcriptome, proteome, metabolome, lipidome, phosphoproteome, acetylproteome, ubiquitylproteome, epigenome and immunome in whole blood, plasma and 18 solid tissues in male and female Rattus norvegicus over eight weeks of endurance exercise training. The resulting data compendium encompasses 9,466 assays across 19 tissues, 25 molecular platforms and 4 training time points. Thousands of shared and tissue-specific molecular alterations were identified, with sex differences found in multiple tissues. Temporal multi-omic and multi-tissue analyses revealed expansive biological insights into the adaptive responses to endurance training, including widespread regulation of immune, metabolic, stress response and mitochondrial pathways. Many changes were relevant to human health, including non-alcoholic fatty liver disease, inflammatory bowel disease, cardiovascular health and tissue injury and recovery. The data and analyses presented in this study will serve as valuable resources for understanding and exploring the multi-tissue molecular effects of endurance training and are provided in a public repository ( https://motrpac-data.org/ ).
View details for DOI 10.1038/s41586-023-06877-w
View details for PubMedID 38693412
View details for PubMedCentralID PMC11062907
-
regionalpcs: improved discovery of DNA methylation associations with complex traits.
bioRxiv : the preprint server for biology
2024
Abstract
We have developed the regional principal components (rPCs) method, a novel approach for summarizing gene-level methylation. rPCs address the challenge of deciphering complex epigenetic mechanisms in diseases like Alzheimer's disease (AD). In contrast to traditional averaging, rPCs leverage principal components analysis to capture complex methylation patterns across gene regions. Our method demonstrated a 54% improvement in sensitivity over averaging in simulations, offering a robust framework for identifying subtle epigenetic variations. Applying rPCs to the AD brain methylation data in ROSMAP, combined with cell type deconvolution, we uncovered 838 differentially methylated genes associated with neuritic plaque burden-significantly outperforming conventional methods. Integrating methylation quantitative trait loci (meQTL) with genome-wide association studies (GWAS) identified 17 genes with potential causal roles in AD, including MS4A4A and PICALM. Our approach is available in the Bioconductor package regionalpcs, opening avenues for research and facilitating a deeper understanding of the epigenetic landscape in complex diseases.
View details for DOI 10.1101/2024.05.01.590171
View details for PubMedID 38746367
View details for PubMedCentralID PMC11092597
-
Sexual dimorphism and the multi-omic response to exercise training in rat subcutaneous white adipose tissue.
Nature metabolism
2024
Abstract
Subcutaneous white adipose tissue (scWAT) is a dynamic storage and secretory organ that regulates systemic homeostasis, yet the impact of endurance exercise training (ExT) and sex on its molecular landscape is not fully established. Utilizing an integrative multi-omics approach, and leveraging data generated by the Molecular Transducers of Physical Activity Consortium (MoTrPAC), we show profound sexual dimorphism in the scWAT of sedentary rats and in the dynamic response of this tissue to ExT. Specifically, the scWAT of sedentary females displays -omic signatures related to insulin signaling and adipogenesis, whereas the scWAT of sedentary males is enriched in terms related to aerobic metabolism. These sex-specific -omic signatures are preserved or amplified with ExT. Integration of multi-omic analyses with phenotypic measures identifies molecular hubs predicted to drive sexually distinct responses to training. Overall, this study underscores the powerful impact of sex on adipose tissue biology and provides a rich resource to investigate the scWAT response to ExT.
View details for DOI 10.1038/s42255-023-00959-9
View details for PubMedID 38693320
-
Molecular adaptations in response to exercise training are associated with tissue-specific transcriptomic and epigenomic signatures.
Cell genomics
2024: 100421
Abstract
Regular exercise has many physical and brain health benefits, yet the molecular mechanisms mediating exercise effects across tissues remain poorly understood. Here we analyzed 400 high-quality DNA methylation, ATAC-seq, and RNA-seq datasets from eight tissues from control and endurance exercise-trained (EET) rats. Integration of baseline datasets mapped the gene location dependence of epigenetic control features and identified differing regulatory landscapes in each tissue. The transcriptional responses to 8weeks of EET showed little overlap across tissues and predominantly comprised tissue-type enriched genes. We identified sex differences in the transcriptomic and epigenomic changes induced by EET. However, the sex-biased gene responses were linked to shared signaling pathways. We found that many G protein-coupled receptor-encoding genes are regulated by EET, suggesting a role for these receptors in mediating the molecular adaptations to training across tissues. Our findings provide new insights into the mechanisms underlying EET-induced health benefits across organs.
View details for DOI 10.1016/j.xgen.2023.100421
View details for PubMedID 38697122
-
Molecular Transducers of Physical Activity Consortium (MoTrPAC): Human Studies Design and Protocol.
Journal of applied physiology (Bethesda, Md. : 1985)
2024
Abstract
Physical activity, including structured exercise, is associated with favorable health-related chronic disease outcomes. While there is evidence of various molecular pathways that affect these responses, a comprehensive molecular map of these molecular responses to exercise has not been developed. The Molecular Transducers of Physical Activity Consortium (MoTrPAC) is a multi-center study designed to isolate the effects of structured exercise training on the molecular mechanisms underlying the health benefits of exercise and physical activity. MoTrPAC contains both a pre-clinical and human component. The details of the human studies component of MoTrPAC that include the design and methods are presented here. The human studies contain both an adult and pediatric component. In the adult component, sedentary participants are randomized to 12 weeks of Control, Endurance Exercise Training, or Resistance Exercise Training with outcomes measures completed before and following the 12 weeks. The adult component also includes recruitment of highly active endurance trained or resistance trained participants who only complete measures once. A similar design is used for the pediatric component; however, only endurance exercise is examined. Phenotyping measures include weight, body composition, vital signs, cardiorespiratory fitness, muscular strength, physical activity and diet, and other questionnaires. Participants also complete an acute rest period (adults only) or exercise session (adults, pediatrics) with collection of biospecimens (blood only for pediatrics) to allow for examination of the molecular responses. The design and methods of MoTrPAC may inform other studies. Moreover, MoTrPAC will provide a repository of data that can be used broadly across the scientific community.
View details for DOI 10.1152/japplphysiol.00102.2024
View details for PubMedID 38634503
-
The mitochondrial multi-omic response to exercise training across rat tissues.
Cell metabolism
2024
Abstract
Mitochondria have diverse functions critical to whole-body metabolic homeostasis. Endurance training alters mitochondrial activity, but systematic characterization of these adaptations is lacking. Here, the Molecular Transducers of Physical Activity Consortium mapped the temporal, multi-omic changes in mitochondrial analytes across 19 tissues in male and female rats trained for 1, 2, 4, or 8 weeks. Training elicited substantial changes in the adrenal gland, brown adipose, colon, heart, and skeletal muscle. The colon showed non-linear response dynamics, whereas mitochondrial pathways were downregulated in brown adipose and adrenal tissues. Protein acetylation increased in the liver, with a shift in lipid metabolism, whereas oxidative proteins increased in striated muscles. Exercise-upregulated networks were downregulated in human diabetes and cirrhosis. Knockdown of the central network protein 17-beta-hydroxysteroid dehydrogenase 10 (HSD17B10) elevated oxygen consumption, indicative of metabolic stress. We provide a multi-omic, multi-tissue, temporal atlas of the mitochondrial response to exercise training and identify candidates linked to mitochondrial dysfunction.
View details for DOI 10.1016/j.cmet.2023.12.021
View details for PubMedID 38701776
-
De novo variants in the non-coding spliceosomal snRNA gene RNU4-2 are a frequent cause of syndromic neurodevelopmental disorders.
medRxiv : the preprint server for health sciences
2024
Abstract
Around 60% of individuals with neurodevelopmental disorders (NDD) remain undiagnosed after comprehensive genetic testing, primarily of protein-coding genes1. Increasingly, large genome-sequenced cohorts are improving our ability to discover new diagnoses in the non-coding genome. Here, we identify the non-coding RNA RNU4-2 as a novel syndromic NDD gene. RNU4-2 encodes the U4 small nuclear RNA (snRNA), which is a critical component of the U4/U6.U5 tri-snRNP complex of the major spliceosome2. We identify an 18 bp region of RNU4-2 mapping to two structural elements in the U4/U6 snRNA duplex (the T-loop and Stem III) that is severely depleted of variation in the general population, but in which we identify heterozygous variants in 119 individuals with NDD. The vast majority of individuals (77.3%) have the same highly recurrent single base-pair insertion (n.64_65insT). We estimate that variants in this region explain 0.41% of individuals with NDD. We demonstrate that RNU4-2 is highly expressed in the developing human brain, in contrast to its contiguous counterpart RNU4-1 and other U4 homologs, supporting RNU4-2's role as the primary U4 transcript in the brain. Overall, this work underscores the importance of non-coding genes in rare disorders. It will provide a diagnosis to thousands of individuals with NDD worldwide and pave the way for the development of effective treatments for these individuals.
View details for DOI 10.1101/2024.04.07.24305438
View details for PubMedID 38645094
View details for PubMedCentralID PMC11030480
-
Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease.
medRxiv : the preprint server for health sciences
2024
Abstract
Rare structural variants (SVs) - insertions, deletions, and complex rearrangements - can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don't incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.
View details for DOI 10.1101/2024.03.22.24304565
View details for PubMedID 38585781
View details for PubMedCentralID PMC10996727
-
Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation.
medRxiv : the preprint server for health sciences
2024
Abstract
Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
View details for DOI 10.1101/2024.03.05.24303792
View details for PubMedID 38496498
View details for PubMedCentralID PMC10942501
-
RNA Sequencing in Disease Diagnosis.
Annual review of genomics and human genetics
2024
Abstract
RNA sequencing (RNA-seq) enables the accurate measurement of multiple transcriptomic phenotypes for modeling the impacts of disease variants. Advances in technologies, experimental protocols, and analysis strategies are rapidly expanding the application of RNA-seq to identify disease biomarkers, tissue- and cell-type-specific impacts, and the spatial localization of disease-associated mechanisms. Ongoing international efforts to construct biobank-scale transcriptomic repositories with matched genomic data across diverse population groups are further increasing the utility of RNA-seq approaches by providing large-scale normative reference resources. The availability of these resources, combined with improved computational analysis pipelines, has enabled the detection of aberrant transcriptomic phenotypes underlying rare diseases. Further expansion of these resources, across both somatic and developmental tissues, is expected to soon provide unprecedented insights to resolve disease origin, mechanism of action, and causal gene contributions, suggesting the continued high utility of RNA-seq in disease diagnosis. Expected final online publication date for the Annual Review of Genomics and Human Genetics, Volume 25 is August 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
View details for DOI 10.1146/annurev-genom-021623-121812
View details for PubMedID 38360541
-
Impact of genome build on RNA-seq interpretation and diagnostics.
medRxiv : the preprint server for health sciences
2024
Abstract
Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network (UDN) and Genomics Research to Elucidate the Genetics of Rare Disease (GREGoR) Consortium. We identified 2,800 genes with build-dependent quantification across six routinely-collected biospecimens, including 1,391 protein-coding genes and 341 known rare disease genes. We further observed multiple genes that only have detectable expression in a subset of genome builds. Finally, we characterized how genome build impacts the detection of outlier transcriptomic events. Combined, we provide a database of genes impacted by build choice, and recommend that transcriptomics-guided analyses and diagnoses are cross-referenced with these data for robustness.
View details for DOI 10.1101/2024.01.11.24301165
View details for PubMedID 38260490
View details for PubMedCentralID PMC10802764
-
Detection and analysis of complex structural variation in human genomes across populations and in brains of donors with psychiatric disorders
Cell
2024; Published online September 30, 2024
View details for DOI 10.1016/j.cell.2024.09.014
-
Genetic architecture of cardiac dynamic flow volumes.
Nature genetics
2023
Abstract
Cardiac blood flow is a critical determinant of human health. However, the definition of its genetic architecture is limited by the technical challenge of capturing dynamic flow volumes from cardiac imaging at scale. We present DeepFlow, a deep-learning system to extract cardiac flow and volumes from phase-contrast cardiac magnetic resonance imaging. A mixed-linear model applied to 37,653 individuals from the UK Biobank reveals genome-wide significant associations across cardiac dynamic flow volumes spanning from aortic forward velocity to aortic regurgitation fraction. Mendelian randomization reveals a causal role for aortic root size in aortic valve regurgitation. Among the most significant contributing variants, localizing genes (near ELN, PRDM6 and ADAMTS7) are implicated in connective tissue and blood pressure pathways. Here we show that DeepFlow cardiac flow phenotyping at scale, combined with genotyping data, reinforces the contribution of connective tissue genes, blood pressure and root size to aortic valve function.
View details for DOI 10.1038/s41588-023-01587-5
View details for PubMedID 38082205
View details for PubMedCentralID 7612636
-
Organ aging signatures in the plasma proteome track health and disease.
Nature
2023; 624 (7990): 164-172
Abstract
Animal studies show aging varies between individuals as well as between organs within an individual1-4, but whether this is true in humans and its effect on age-related diseases is unknown. We utilized levels of human blood plasma proteins originating from specific organs to measure organ-specific aging differences in living individuals. Using machine learning models, we analysed aging in 11 major organs and estimated organ age reproducibly in five independent cohorts encompassing 5,676 adults across the human lifespan. We discovered nearly 20% of the population show strongly accelerated age in one organ and 1.7% are multi-organ agers. Accelerated organ aging confers 20-50% higher mortality risk, and organ-specific diseases relate to faster aging of those organs. We find individuals with accelerated heart aging have a 250% increased heart failure risk and accelerated brain and vascular aging predict Alzheimer's disease (AD) progression independently from and as strongly as plasma pTau-181 (ref. 5), the current best blood-based biomarker for AD. Our models link vascular calcification, extracellular matrix alterations and synaptic protein shedding to early cognitive decline. We introduce a simple and interpretable method to study organ aging using plasma proteomics data, predicting diseases and aging effects.
View details for DOI 10.1038/s41586-023-06802-1
View details for PubMedID 38057571
View details for PubMedCentralID PMC10700136
-
Transcriptomics and chromatin accessibility in multiple African population samples.
bioRxiv : the preprint server for biology
2023
Abstract
Mapping the functional human genome and impact of genetic variants is often limited to European-descendent population samples. To aid in overcoming this limitation, we measured gene expression using RNA sequencing in lymphoblastoid cell lines (LCLs) from 599 individuals from six African populations to identify novel transcripts including those not represented in the hg38 reference genome. We used whole genomes from the 1000 Genomes Project and 164 Maasai individuals to identify 8,881 expression and 6,949 splicing quantitative trait loci (eQTLs/sQTLs), and 2,611 structural variants associated with gene expression (SV-eQTLs). We further profiled chromatin accessibility using ATAC-Seq in a subset of 100 representative individuals, to identity chromatin accessibility quantitative trait loci (caQTLs) and allele-specific chromatin accessibility, and provide predictions for the functional effect of 78.9 million variants on chromatin accessibility. Using this map of eQTLs and caQTLs we fine-mapped GWAS signals for a range of complex diseases. Combined, this work expands global functional genomic data to identify novel transcripts, functional elements and variants, understand population genetic history of molecular quantitative trait loci, and further resolve the genetic basis of multiple human traits and disease.
View details for DOI 10.1101/2023.11.04.564839
View details for PubMedID 37986808
View details for PubMedCentralID PMC10659267
-
Multi- Omic Profiling of Macrophages Lacking Tet2 or Dnmt3a Reveals Mechanisms of Hyper-Inflammation in Clonal Hematopoiesis
AMER SOC HEMATOLOGY. 2023
View details for DOI 10.1182/blood-2023-187890
View details for Web of Science ID 001159306704186
-
Integrative analyses highlight functional regulatory variants associated with neuropsychiatric diseases.
Nature genetics
2023
Abstract
Noncoding variants of presumed regulatory function contribute to the heritability of neuropsychiatric disease. A total of 2,221 noncoding variants connected to risk for ten neuropsychiatric disorders, including autism spectrum disorder, attention deficit hyperactivity disorder, bipolar disorder, borderline personality disorder, major depression, generalized anxiety disorder, panic disorder, post-traumatic stress disorder, obsessive-compulsive disorder and schizophrenia, were studied in developing human neural cells. Integrating epigenomic and transcriptomic data with massively parallel reporter assays identified differentially-active single-nucleotide variants (daSNVs) in specific neural cell types. Expression-gene mapping, network analyses and chromatin looping nominated candidate disease-relevant target genes modulated by these daSNVs. Follow-up integration of daSNV gene editing with clinical cohort analyses suggested that magnesium transport dysfunction may increase neuropsychiatric disease risk and indicated that common genetic pathomechanisms may mediate specific symptoms that are shared across multiple neuropsychiatric diseases.
View details for DOI 10.1038/s41588-023-01533-5
View details for PubMedID 37857935
View details for PubMedCentralID 4112379
-
The functional impact of rare variation across the regulatory cascade.
Cell genomics
2023; 3 (10): 100401
Abstract
Each human genome has tens of thousands of rare genetic variants; however, identifying impactful rare variants remains a major challenge. We demonstrate how use of personal multi-omics can enable identification of impactful rare variants by using the Multi-Ethnic Study of Atherosclerosis, which included several hundred individuals, with whole-genome sequencing, transcriptomes, methylomes, and proteomes collected across two time points, 10 years apart. We evaluated each multi-omics phenotype's ability to separately and jointly inform functional rare variation. By combining expression and protein data, we observed rare stop variants 62 times and rare frameshift variants 216 times as frequently as controls, compared to 13-27 times as frequently for expression or protein effects alone. We extended a Bayesian hierarchical model, "Watershed," to prioritize specific rare variants underlying multi-omics signals across the regulatory cascade. With this approach, we identified rare variants that exhibited large effect sizes on multiple complex traits including height, schizophrenia, and Alzheimer's disease.
View details for DOI 10.1016/j.xgen.2023.100401
View details for PubMedID 37868038
View details for PubMedCentralID PMC10589633
-
Integrated single-cell multiome analysis reveals muscle fiber-type gene regulatory circuitry modulated by endurance exercise.
bioRxiv : the preprint server for biology
2023
Abstract
Endurance exercise is an important health modifier. We studied cell-type specific adaptations of human skeletal muscle to acute endurance exercise using single-nucleus (sn) multiome sequencing in human vastus lateralis samples collected before and 3.5 hours after 40 min exercise at 70% VO2max in four subjects, as well as in matched time of day samples from two supine resting circadian controls. High quality same-cell RNA-seq and ATAC-seq data were obtained from 37,154 nuclei comprising 14 cell types. Among muscle fiber types, both shared and fiber-type specific regulatory programs were identified. Single-cell circuit analysis identified distinct adaptations in fast, slow and intermediate fibers as well as LUM-expressing FAP cells, involving a total of 328 transcription factors (TFs) acting at altered accessibility sites regulating 2,025 genes. These data and circuit mapping provide single-cell insight into the processes underlying tissue and metabolic remodeling responses to exercise.
View details for DOI 10.1101/2023.09.26.558914
View details for PubMedID 37808658
View details for PubMedCentralID PMC10557702
-
Author Correction: Africa-specific human genetic variation near CHD1L associates with HIV-1 load.
Nature
2023
View details for DOI 10.1038/s41586-023-06591-7
View details for PubMedID 37670157
-
Beyond the exome: What's next in diagnostic testing for Mendelian conditions.
American journal of human genetics
2023; 110 (8): 1229-1248
Abstract
Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order, and emerging technologies, such as optical genome mapping and long-read DNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to research consortia focused on elucidating the underlying cause of rare unsolved genetic disorders.
View details for DOI 10.1016/j.ajhg.2023.06.009
View details for PubMedID 37541186
-
Africa-specific human genetic variation near CHD1L associates with HIV-1 load.
Nature
2023
Abstract
HIV-1 remains a global health crisis1, highlighting the need to identify new targets for therapies. Here, given the disproportionate HIV-1 burden and marked human genome diversity in Africa2, we assessed the genetic determinants of control of set-point viral load in 3,879 people of African ancestries living with HIV-1 participating in the international collaboration for the genomics of HIV3. We identify a previously undescribed association signal on chromosome 1 where the peak variant associates with an approximately 0.3 log10-transformed copies per ml lower set-point viral load per minor allele copy and is specific to populations of African descent. The top associated variant is intergenic and lies between a long intergenic non-coding RNA (LINC00624) and the coding gene CHD1L, which encodes a helicase that is involved in DNA repair4. Infection assays in iPS cell-derived macrophages and other immortalized cell lines showed increased HIV-1 replication in CHD1L-knockdown and CHD1L-knockout cells. We provide evidence from population genetic studies that Africa-specific genetic variation near CHD1L associates with HIV replication in vivo. Although experimental studies suggest that CHD1L is able to limit HIV infection in some cell types in vitro, further investigation is required to understand the mechanisms underlying our observations, including any potential indirect effects of CHD1L on HIV spread in vivo that our cell-based assays cannot recapitulate.
View details for DOI 10.1038/s41586-023-06370-4
View details for PubMedID 37532928
View details for PubMedCentralID 3723635
-
Molecular quantitative trait loci
NATURE REVIEWS METHODS PRIMERS
2023; 3 (1)
View details for DOI 10.1038/s43586-022-00188-6
View details for Web of Science ID 000922834900001
-
Beyond the exome: what's next in diagnostic testing for Mendelian conditions.
ArXiv
2023
Abstract
Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order and emerging technologies, such as optical genome mapping and long-read DNA or RNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to a consortium such as GREGoR, which is focused on elucidating the underlying cause of rare unsolved genetic disorders.
View details for DOI 10.1002/ajmg.a.63053
View details for PubMedID 36713248
View details for PubMedCentralID PMC9882576
-
The mitochondrial multi-omic response to exercise training across tissues.
bioRxiv : the preprint server for biology
2023
Abstract
Mitochondria are adaptable organelles with diverse cellular functions critical to whole-body metabolic homeostasis. While chronic endurance exercise training is known to alter mitochondrial activity, these adaptations have not yet been systematically characterized. Here, the Molecular Transducers of Physical Activity Consortium (MoTrPAC) mapped the longitudinal, multi-omic changes in mitochondrial analytes across 19 tissues in male and female rats endurance trained for 1, 2, 4 or 8 weeks. Training elicited substantial changes in the adrenal gland, brown adipose, colon, heart and skeletal muscle, while we detected mild responses in the brain, lung, small intestine and testes. The colon response was characterized by non-linear dynamics that resulted in upregulation of mitochondrial function that was more prominent in females. Brown adipose and adrenal tissues were characterized by substantial downregulation of mitochondrial pathways. Training induced a previously unrecognized robust upregulation of mitochondrial protein abundance and acetylation in the liver, and a concomitant shift in lipid metabolism. The striated muscles demonstrated a highly coordinated response to increase oxidative capacity, with the majority of changes occurring in protein abundance and post-translational modifications. We identified exercise upregulated networks that are downregulated in human type 2 diabetes and liver cirrhosis. In both cases HSD17B10, a central dehydrogenase in multiple metabolic pathways and mitochondrial tRNA maturation, was the main hub. In summary, we provide a multi-omic, cross-tissue atlas of the mitochondrial response to training and identify candidates for prevention of disease-associated mitochondrial dysfunction.
View details for DOI 10.1101/2023.01.13.523698
View details for PubMedID 36711881
View details for PubMedCentralID PMC9882193
-
Multiomic identification of key transcriptional regulatory programs during endurance exercise training.
bioRxiv : the preprint server for biology
2023
Abstract
Transcription factors (TFs) play a key role in regulating gene expression and responses to stimuli. We conducted an integrated analysis of chromatin accessibility and RNA expression across various rat tissues following endurance exercise training (EET) to map epigenomic changes to transcriptional changes and determine key TFs involved. We uncovered tissue-specific changes across both omic layers, including highly correlated differentially accessible regions (DARs) and differentially expressed genes (DEGs). We identified open chromatin regions associated with DEGs (DEGaPs) and found tissue-specific and genomic feature-specific TF motif enrichment patterns among both DARs and DEGaPs. Accessible promoters of up-vs. down-regulated DEGs per tissue showed distinct TF enrichment patterns. Further, some EET-induced TFs in skeletal muscle were either validated at the proteomic level (MEF2C and NUR77) or correlated with exercise-related phenotypic changes. We provide an in-depth analysis of the epigenetic and trans-factor-dependent processes governing gene expression during EET.
View details for DOI 10.1101/2023.01.10.523450
View details for PubMedID 36711841
-
RNAget: an API to securely retrieve RNA quantifications.
Bioinformatics (Oxford, England)
2023; 39 (4)
Abstract
SUMMARY: Large-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health project, we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expression matrix-format data, including RNA sequencing and microarrays. Further, it generalizes to quantification matrices of other sequence-based genomics such as ATAC-seq and ChIP-seq.AVAILABILITY AND IMPLEMENTATION: https://ga4gh-rnaseq.github.io/schema/docs/index.html.
View details for DOI 10.1093/bioinformatics/btad126
View details for PubMedID 36897015
-
Methylation differences in Alzheimer's disease neuropathologic change in the aged human brain.
Acta neuropathologica communications
2022; 10 (1): 174
Abstract
Alzheimer's disease (AD) is the most common cause of dementia with advancing age as its strongest risk factor. AD neuropathologic change (ADNC) is known to be associated with numerous DNA methylation changes in the human brain, but the oldest old (> 90 years) have so far been underrepresented in epigenetic studies of ADNC. Our study participants were individuals aged over 90 years (n = 47) from The 90+ Study. We analyzed DNA methylation from bulk samples in eight precisely dissected regions of the human brain: middle frontal gyrus, cingulate gyrus, entorhinal cortex, dentate gyrus, CA1, substantia nigra, locus coeruleus and cerebellar cortex. We deconvolved our bulk data into cell-type-specific (CTS) signals using computational methods. CTS methylation differences were analyzed across different levels of ADNC. The highest amount of ADNC related methylation differences was found in the dentate gyrus, a region that has so far been underrepresented in large scale multi-omic studies. In neurons of the dentate gyrus, DNA methylation significantly differed with increased burden of amyloid beta (Aβ) plaques at 5897 promoter regions of protein-coding genes. Amongst these, higher Aβ plaque burden was associated with promoter hypomethylation of the Presenilin enhancer 2 (PEN-2) gene, one of the rate limiting genes in the formation of gamma-secretase, a multicomponent complex that is responsible in part for the endoproteolytic cleavage of amyloid precursor protein into Aβ peptides. In addition to novel ADNC related DNA methylation changes, we present the most detailed array-based methylation survey of the old aged human brain to date. Our open-sourced dataset can serve as a brain region reference panel for future studies and help advance research in aging and neurodegenerative diseases.
View details for DOI 10.1186/s40478-022-01470-0
View details for PubMedID 36447297
View details for PubMedCentralID PMC9710143
-
Deep learning-assisted genome-wide characterization of massively parallel reporter assays.
Nucleic acids research
2022
Abstract
Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human genome limits our understanding of the regulatory effects they detect. To address this, we develop a deep learning model, MpraNet, to distinguish potential MPRA targets from the background genome. This model achieves high discriminative performance (AUROC=0.85) at differentiating MPRA positives from a set of control variants that mimic the background genome when applied to the lymphoblastoid cell line. We observe that existing functional scores represent very distinct functional effects, and most of them fail to characterize the regulatory effect that MPRA detects. Using MpraNet, we predict potential MPRA functional variants across the genome and identify the distributions of MPRA effect relative to other characteristics of genetic variation, including allele frequency, alternative functional annotations specified by FAVOR, and phenome-wide associations. We also observed that the predicted MPRA positives are not uniformly distributed across the genome; instead, they are clumped together in active regions comprising 9.95% of the genome and inactive regions comprising 89.07% of the genome. Furthermore, we propose our model as a screen to filter MPRA experiment candidates at genome-wide scale, enabling future experiments to be more cost-efficient by increasing precision relative to that observed from previous MPRAs.
View details for DOI 10.1093/nar/gkac990
View details for PubMedID 36350674
-
RNA editing underlies genetic risk of common inflammatory diseases.
Nature
2022
Abstract
A major challenge in human genetics is to identify the molecular mechanisms of trait-associated and disease-associated variants. To achieve this, quantitative trait locus (QTL) mapping of genetic variants with intermediate molecular phenotypes such as gene expression and splicing have been widely adopted1,2. However, despite successes, the molecular basis for a considerable fraction of trait-associated and disease-associated variants remains unclear3,4. Here we show that ADAR-mediated adenosine-to-inosine RNA editing, a post-transcriptional event vital for suppressing cellular double-stranded RNA (dsRNA)-mediated innate immune interferon responses5-11, is an important potential mechanism underlying genetic variants associated with common inflammatory diseases. We identified and characterized 30,319 cis-RNA editing QTLs (edQTLs) across 49 human tissues. These edQTLs were significantly enriched in genome-wide association study signals for autoimmune and immune-mediated diseases. Colocalization analysis of edQTLs with disease risk loci further pinpointed key, putatively immunogenic dsRNAs formed by expected inverted repeat Alu elements as well as unexpected, highly over-represented cis-natural antisense transcripts. Furthermore, inflammatory disease risk variants, in aggregate, were associated with reduced editing of nearby dsRNAs and induced interferon responses in inflammatory diseases. This unique directional effect agrees with the established mechanism that lack of RNA editing by ADAR1 leads to the specific activation of the dsRNA sensor MDA5 and subsequent interferon responses and inflammation7-9. Our findings implicate cellular dsRNA editing and sensing as a previously underappreciated mechanism of common inflammatory diseases.
View details for DOI 10.1038/s41586-022-05052-x
View details for PubMedID 35922514
-
Temporal dynamics of the multi-omic response to endurance exercise training across tissues
ELSEVIER. 2022: S31
View details for DOI 10.1016/j.mcpro.2022.100313
View details for Web of Science ID 000898188800027
-
Integration of rare expression outlier-associated variants improves polygenic risk prediction.
American journal of human genetics
2022
Abstract
Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p= 3*10-14), 62.3% increase in risk for severe obesity (p= 1*10-6), and median 5.29 years earlier onset for bariatric surgery (p=0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p= 2*10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.
View details for DOI 10.1016/j.ajhg.2022.04.015
View details for PubMedID 35588732
-
Multiple causal variants underlie genetic associations in humans.
Science (New York, N.Y.)
2022; 375 (6586): 1247-1254
Abstract
Associations between genetic variation and traits are often in noncoding regions with strong linkage disequilibrium (LD), where a single causal variant is assumed to underlie the association. We applied a massively parallel reporter assay (MPRA) to functionally evaluate genetic variants in high, local LD for independent cis-expression quantitative trait loci (eQTL). We found that 17.7% of eQTLs exhibit more than one major allelic effect in tight LD. The detected regulatory variants were highly and specifically enriched for activating chromatin structures and allelic transcription factor binding. Integration of MPRA profiles with eQTL/complex trait colocalizations across 114 human traits and diseases identified causal variant sets demonstrating how genetic association signals can manifest through multiple, tightly linked causal variants.
View details for DOI 10.1126/science.abj5117
View details for PubMedID 35298243
-
Integration of genetic colocalizations with physiological and pharmacological perturbations identifies cardiometabolic disease genes.
Genome medicine
2022; 14 (1): 31
Abstract
BACKGROUND: Identification of causal genes for polygenic human diseases has been extremely challenging, and our understanding of how physiological and pharmacological stimuli modulate genetic risk at disease-associated loci is limited. Specifically, insulin resistance (IR), a common feature of cardiometabolic disease, including type 2 diabetes, obesity, and dyslipidemia, lacks well-powered genome-wide association studies (GWAS), and therefore, few associated loci and causal genes have been identified.METHODS: Here, we perform and integrate linkage disequilibrium (LD)-adjusted colocalization analyses across nine cardiometabolic traits (fasting insulin, fasting glucose, insulin sensitivity, insulin sensitivity index, type 2 diabetes, triglycerides, high-density lipoprotein, body mass index, and waist-hip ratio) combined with expression and splicing quantitative trait loci (eQTLs and sQTLs) from five metabolically relevant human tissues (subcutaneous and visceral adipose, skeletal muscle, liver, and pancreas). To elucidate the upstream regulators and functional mechanisms for these genes, we integrate their transcriptional responses to 21 relevant physiological and pharmacological perturbations in human adipocytes, hepatocytes, and skeletal muscle cells and map their protein-protein interactions.RESULTS: We identify 470 colocalized loci and prioritize 207 loci with a single colocalized gene. Patterns of shared colocalizations across traits and tissues highlight different potential roles for colocalized genes in cardiometabolic disease and distinguish several genes involved in pancreatic beta-cell function from others with a more direct role in skeletal muscle, liver, and adipose tissues. At the loci with a single colocalized gene, 42 of these genes were regulated by insulin and 35 by glucose in perturbation experiments, including 17 regulated by both. Other metabolic perturbations regulated the expression of 30 more genes not regulated by glucose or insulin, pointing to other potential upstream regulators of candidate causal genes.CONCLUSIONS: Our use of transcriptional responses under metabolic perturbations to contextualize genetic associations from our custom colocalization approach provides a list of likely causal genes and their upstream regulators in the context of IR-associated cardiometabolic risk.
View details for DOI 10.1186/s13073-022-01036-8
View details for PubMedID 35292083
-
Integration of genetic colocalizations with physiological and pharmacological perturbations identifies cardiometabolic disease genes
W B SAUNDERS CO-ELSEVIER INC. 2022: S24-S25
View details for DOI 10.1016/j.metabol.2021.155025
View details for Web of Science ID 000778891500062
-
TOWARDS TRANSCRIPTOMICS AS A PRIMARY TOOL FOR RARE DISEASE INVESTIGATION.
Cold Spring Harbor molecular case studies
2022
Abstract
In the past five years transcriptome or RNA-sequencing (RNA-seq) has steadily emerged as a complementary assay for rare disease diagnosis and discovery. In this perspective, we summarize several recent developments and challenges in use of RNA-seq for rare disease investigation. Using an accessible patient sample, such as blood, skin, or muscle, RNA-seq enables the assay of expressed RNA transcripts. Analysis of RNA-seq allows the identification of aberrant or outlier gene expression and alternative splicing as functional evidence to support rare disease study and diagnosis. Further, many types of variant effects can be profiled beyond coding variants, as the consequences of non-coding variants that impact gene expression and splicing can be directly observed. This is particularly apparent for structural variants which disproportionately underlie outlier gene expression and for splicing variants where RNA-seq can both measure aberrant canonical splicing and detect deep intronic effects. However, a major potential limitation of RNA-seq in rare disease investigation is the developmental and cell type-specificity of gene expression as a pathogenic variant's effect may be limited to a specific spatiotemporal context and access to a patient's tissue sample from the relevant tissue and timing of disease expression may not be possible. We speculate that as advances in computational methods and emerging experimental techniques overcome both developmental and cell type-specificity, there will be broadening use of RNA sequencing and multi-omics in rare disease diagnosis and delivery of precision health.
View details for DOI 10.1101/mcs.a006198
View details for PubMedID 35217565
-
Lymphoid blast transformation in an MPN with BCR-JAK2 treated with ruxolitinib: putative mechanisms of resistance.
Blood advances
2021; 5 (17): 3492-3496
Abstract
The basis for acquired resistance to JAK inhibition in patients with JAK2-driven hematologic malignancies is not well understood. We report a patient with a myeloproliferative neoplasm (MPN) with a BCR activator of RhoGEF and GTPase (BCR)-JAK2 fusion with initial hematologic response to ruxolitinib who rapidly developed B-lymphoid blast transformation. We analyzed pre-ruxolitinib and blast transformation samples using genome sequencing, DNA mate-pair sequencing (MPseq), RNA sequencing (RNA-seq), and chromosomal microarray to characterize possible mechanisms of resistance. No resistance mutations in the BCR-JAK2 fusion gene or transcript were identified, and fusion transcript expression levels remained stable. However, at the time of blast transformation, MPseq detected a new IKZF1 copy-number loss, which is predicted to result in loss of normal IKZF1 protein translation. RNA-seq revealed significant upregulation of genes negatively regulated by IKZF1, including IL7R and CRLF2. Disease progression was also characterized by adaptation to an activated B-cell receptor (BCR)-like signaling phenotype, with marked upregulation of genes such as CD79A, CD79B, IGLL1, VPREB1, BLNK, ZAP70, RAG1, and RAG2. In summary, IKZF1 deletion and a switch from cytokine dependence to activated BCR-like signaling phenotype represent putative mechanisms of ruxolitinib resistance in this case, recapitulating preclinical data on resistance to JAK inhibition in CRLF2-rearranged Philadelphia chromosome-like acute lymphoblastic leukemia.
View details for DOI 10.1182/bloodadvances.2020004174
View details for PubMedID 34505882
-
Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution.
Cell
2021
Abstract
3' untranslated region (3'UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the massively parallel reporter assay for 3'UTRs (MPRAu) to sensitively assay 12,173 3'UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3'UTR function, suggesting that simple sequences predominately explain 3'UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base pair resolution, including an adenylate-uridylate (AU)-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3'UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14 and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.
View details for DOI 10.1016/j.cell.2021.08.025
View details for PubMedID 34534445
-
The role of Sp140 revealed in IgE and mast cell responses in Collaborative Cross mice.
JCI insight
2021; 6 (12)
Abstract
Mouse IgE and mast cell (MC) functions have been studied primarily using inbred strains. Here, we (a) identified effects of genetic background on mouse IgE and MC phenotypes, (b) defined the suitability of various strains for studying IgE and MC functions, and (c) began to study potentially novel genes involved in such functions. We screened 47 Collaborative Cross (CC) strains, as well as C57BL/6J and BALB/cJ mice, for strength of passive cutaneous anaphylaxis (PCA) and responses to the intestinal parasite Strongyloides venezuelensis (S.v.). CC mice exhibited a diversity in PCA strength and S.v. responses. Among strains tested, C57BL/6J and CC027 mice showed, respectively, moderate and uniquely potent MC activity. Quantitative trait locus analysis and RNA sequencing of BM-derived cultured MCs (BMCMCs) from CC027 mice suggested Sp140 as a candidate gene for MC activation. siRNA-mediated knock-down of Sp140 in BMCMCs decreased IgE-dependent histamine release and cytokine production. Our results demonstrated marked variations in IgE and MC activity in vivo, and in responses to S.v., across CC strains. C57BL/6J and CC027 represent useful models for studying MC functions. Additionally, we identified Sp140 as a gene that contributes to IgE-dependent MC activation.
View details for DOI 10.1172/jci.insight.146572
View details for PubMedID 34156030
-
Identification of putative causal loci in whole-genome sequencing data via knockoff statistics.
Nature communications
2021; 12 (1): 3152
Abstract
The analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability, and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.
View details for DOI 10.1038/s41467-021-22889-4
View details for PubMedID 34035245
-
Compound heterozygous KCTD7 variants in progressive myoclonus epilepsy.
Journal of neurogenetics
2021: 1–10
Abstract
KCTD7 is a member of the potassium channel tetramerization domain-containing protein family and has been associated with progressive myoclonic epilepsy (PME), characterized by myoclonus, epilepsy, and neurological deterioration. Here we report four affected individuals from two unrelated families in which we identified KCTD7 compound heterozygous single nucleotide variants through exome sequencing. RNAseq was used to detect a non-annotated splicing junction created by a synonymous variant in the second family. Whole-cell patch-clamp analysis of neuroblastoma cells overexpressing the patients' variant alleles demonstrated aberrant potassium regulation. While all four patients experienced many of the common clinical features of PME, they also showed variable phenotypes not previously reported, including dysautonomia, brain pathology findings including a significantly reduced thalamus, and the lack of myoclonic seizures. To gain further insight into the pathogenesis of the disorder, zinc finger nucleases were used to generate kctd7 knockout zebrafish. Kctd7 homozygous mutants showed global dysregulation of gene expression and increased transcription of c-fos, which has previously been correlated with seizure activity in animal models. Together these findings expand the known phenotypic spectrum of KCTD7-associated PME, report a new animal model for future studies, and contribute valuable insights into the disease.
View details for DOI 10.1080/01677063.2021.1892095
View details for PubMedID 33970744
-
Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease.
Cell
2021
Abstract
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.
View details for DOI 10.1016/j.cell.2021.03.050
View details for PubMedID 33864768
-
Functional and structural analysis of cytokine selective IL6ST defects that cause recessive hyper-IgE syndrome.
The Journal of allergy and clinical immunology
2021
Abstract
BACKGROUND: Biallelic variants in IL6ST cause a recessive form of hyper-IgE syndrome (HIES) characterized by high IgE, eosinophilia, defective acute phase response, susceptibility to bacterial infections and skeletal abnormalities due to cytokine selective loss-of-function in GP130 with defective IL-6 and IL-11, variable OSM and IL-27 but sparing LIF signaling.OBJECTIVE: To understand the functional and structural impact of recessive HIES-associated IL6ST variants.METHODS: We investigated a patient with HIES using exome, genome and RNA sequencing. Functional assays assessed IL-6, IL-11, IL-27, OSM, LIF, CT-1, CLC, and CNTF signaling. Molecular dynamic simulations and structural modeling of GP130 cytokine receptor complexes were performed.RESULTS: We identify a patient with compound heterozygous novel missense variants in IL6ST (p.Ala517Pro, and exon-skipping null variant p.Gly484_Pro518delinsArg). The p.Ala517Pro variant results in a more profound IL-6 and IL-11 dominated signaling defect compared to the previously identified recessive IL6ST variants p.Asn404Tyr, and p.Pro498Leu. Molecular dynamics simulations suggest that the p.Ala517Pro and p.Asn404Tyr variants result in increased flexibility of the extracellular membrane-proximal domains of GP130. We propose a structural model that explains the cytokine selectivity of pathogenic IL6ST variants that result in recessive HIES. The variants destabilize the hexameric cytokine receptor complexes whereas the trimeric LIF-GP130-LIFR complex remains stable by an additional membrane-proximal interaction. Deletion of this membrane-proximal interaction site in GP130 consequently causes additional defective LIF signaling and Stuve-Wiedemann syndrome.CONCLUSION: Our data provide a structural basis to understand clinical phenotypes in patients with IL6ST variants.
View details for DOI 10.1016/j.jaci.2021.02.044
View details for PubMedID 33771552
-
Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics.
Nature genetics
2021
Abstract
Induced pluripotent stem cells (iPSCs) are an established cellular system to study the impact of genetic variants in derived cell types and developmental contexts. However, in their pluripotent state, the disease impact of genetic variants is less well known. Here, we integrate data from 1,367 human iPSC lines to comprehensively map common and rare regulatory variants in human pluripotent cells. Using this population-scale resource, we report hundreds of new colocalization events for human traits specific to iPSCs, and find increased power to identify rare regulatory variants compared with somatic tissues. Finally, we demonstrate how iPSCs enable the identification of causal genes for rare diseases.
View details for DOI 10.1038/s41588-021-00800-7
View details for PubMedID 33664507
-
Evaluating the Genomic Parameters Governing rAAV-Mediated Homologous Recombination
MOLECULAR THERAPY
2021; 29 (3): 1028–46
View details for DOI 10.1016/j.ymthe.2020.11.O25
View details for Web of Science ID 000632042500016
-
Exploiting the GTEx resources to decipher the mechanisms at GWAS loci.
Genome biology
2021; 22 (1): 49
Abstract
The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.
View details for DOI 10.1186/s13059-020-02252-4
View details for PubMedID 33499903
-
Nonsense-mediated decay is highly stable across individuals and tissues.
American journal of human genetics
2021
Abstract
Precise interpretation of the effects of rare protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA sequencing of the Genotype Tissue Expression v.8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency ≤ 1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, the inclusion of ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.
View details for DOI 10.1016/j.ajhg.2021.06.008
View details for PubMedID 34216550
-
An integrated approach to identify environmental modulators of genetic risk factors for complex traits.
American journal of human genetics
2021
Abstract
Complex traits and diseases can be influenced by both genetics and environment. However, given the large number of environmental stimuli and power challenges for gene-by-environment testing, it remains a critical challenge to identify and prioritize specific disease-relevant environmental exposures. We propose a framework for leveraging signals from transcriptional responses to environmental perturbations to identify disease-relevant perturbations that can modulate genetic risk for complex traits and inform the functions of genetic variants associated with complex traits. We perturbed human skeletal-muscle-, fat-, and liver-relevant cell lines with 21 perturbations affecting insulin resistance, glucose homeostasis, and metabolic regulation in humans and identified thousands of environmentally responsive genes. By combining these data with GWASs from 31 distinct polygenic traits, we show that the heritability of multiple traits is enriched in regions surrounding genes responsive to specific perturbations and, further, that environmentally responsive genes are enriched for associations with specific diseases and phenotypes from the GWAS Catalog. Overall, we demonstrate the advantages of large-scale characterization of transcriptional changes in diversely stimulated and pathologically relevant cells to identify disease-relevant perturbations.
View details for DOI 10.1016/j.ajhg.2021.08.014
View details for PubMedID 34582792
-
Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer's and Parkinson's diseases.
Nature genetics
2020
Abstract
Genome-wide association studies of neurological diseases have identified thousands of variants associated with disease phenotypes. However, most of these variants do not alter coding sequences, making it difficult to assign their function. Here, we present a multi-omic epigenetic atlas of the adult human brain through profiling of single-cell chromatin accessibility landscapes and three-dimensional chromatin interactions of diverse adult brain regions across a cohort of cognitively healthy individuals. We developed a machine-learning classifier to integrate this multi-omic framework and predict dozens of functional SNPs for Alzheimer's and Parkinson's diseases, nominating target genes and cell types for previously orphaned loci from genome-wide association studies. Moreover, we dissected the complex inverted haplotype of the MAPT (encoding tau) Parkinson's disease risk locus, identifying putative ectopic regulatory interactions in neurons that may mediate this disease association. This work expands understanding of inherited variation and provides a roadmap for the epigenomic dissection of causal regulatory variation in disease.
View details for DOI 10.1038/s41588-020-00721-x
View details for PubMedID 33106633
-
The GTEx Consortium atlas of genetic regulatory effects across human tissues
SCIENCE
2020; 369 (6509): 1318-+
View details for DOI 10.1126/science.aaz1776
View details for Web of Science ID 000569840300041
-
Molecular Transducers of Physical Activity Consortium (MoTrPAC): Mapping the Dynamic Responses to Exercise.
Cell
2020; 181 (7): 1464–74
Abstract
Exercise provides a robust physiological stimulus that evokes cross-talk among multiple tissues that when repeated regularly (i.e., training) improves physiological capacity, benefits numerous organ systems, and decreases the risk for premature mortality. However, a gap remains in identifying the detailed molecular signals induced by exercise that benefits health and prevents disease. The Molecular Transducers of Physical Activity Consortium (MoTrPAC) was established to address this gap and generate a molecular map of exercise. Preclinical and clinical studies will examine the systemic effects of endurance and resistance exercise across a range of ages and fitness levels by molecular probing of multiple tissues before and after acute and chronic exercise. From this multi-omic and bioinformatic analysis, a molecular map of exercise will be established. Altogether, MoTrPAC will provide a public database that is expected to enhance our understanding of the health benefits of exercise and to provide insight into how physical activity mitigates disease.
View details for DOI 10.1016/j.cell.2020.06.004
View details for PubMedID 32589957
-
Discovery and quality analysis of a comprehensive set of structural variants and short tandem repeats.
Nature communications
2020; 11 (1): 2928
Abstract
Structural variants (SVs) and short tandem repeats (STRs) are important sources of genetic diversity but are not routinely analyzed in genetic studies because they are difficult to accurately identify and genotype. Because SVs and STRs range in size and type, it is necessary to apply multiple algorithms that incorporate different types of evidence from sequencing data and employ complex filtering strategies to discover a comprehensive set of high-quality and reproducible variants. Here we assemble a set of 719 deep whole genome sequencing (WGS) samples (mean 42*) from 477 distinct individuals which we use to discover and genotype a wide spectrum of SV and STR variants using five algorithms. We use 177 unique pairs of genetic replicates to identify factors that affect variant call reproducibility and develop a systematic filtering strategy to create of one of the most complete and well characterized maps of SVs and STRs to date.
View details for DOI 10.1038/s41467-020-16481-5
View details for PubMedID 32522985
-
Properties of structural variants and short tandem repeats associated with gene expression and complex traits.
Nature communications
2020; 11 (1): 2927
Abstract
Structural variants (SVs) and short tandem repeats (STRs) comprise a broad group of diverse DNA variants which vastly differ in their sizes and distributions across the genome. Here, we identify genomic features of SV classes and STRs that are associated with gene expression and complex traits, including their locations relative to eGenes, likelihood of being associated with multiple eGenes, associated eGene types (e.g., coding, noncoding, level of evolutionary constraint), effect sizes, linkage disequilibrium with tagging single nucleotide variants used in GWAS, and likelihood of being associated with GWAS traits. We identify a set of high-impact SVs/STRs associated with the expression of three or more eGenes via chromatin loops and show that they are highly enriched for being associated with GWAS traits. Our study provides insights into the genomic properties of structural variant classes and short tandem repeats that are associated with gene expression and human traits.
View details for DOI 10.1038/s41467-020-16482-4
View details for PubMedID 32522982
-
Transcriptional and Position Effect Contributions to rAAV-Mediated Gene Targeting
CELL PRESS. 2020: 290
View details for Web of Science ID 000530089301198
-
Molecular Choreography of Acute Exercise.
Cell
2020; 181 (5): 1112–30.e16
Abstract
Acute physical activity leads to several changes in metabolic, cardiovascular, and immune pathways. Although studies have examined selected changes in these pathways, the system-wide molecular response to an acute bout of exercise has not been fully characterized. We performed longitudinal multi-omic profiling of plasma and peripheral blood mononuclear cells including metabolome, lipidome, immunome, proteome, and transcriptome from 36 well-characterized volunteers, before and after a controlled bout of symptom-limited exercise. Time-series analysis revealed thousands of molecular changes and an orchestrated choreography of biological processes involving energy metabolism, oxidative stress, inflammation, tissue repair, and growth factor response, as well as regulatory pathways. Most of these processes were dampened and some were reversed in insulin-resistant participants. Finally, we discovered biological pathways involved in cardiopulmonary exercise response and developed prediction models revealing potential resting blood-based biomarkers of peak oxygen consumption.
View details for DOI 10.1016/j.cell.2020.04.043
View details for PubMedID 32470399
-
Evaluating the genomic parameters governing rAAV-mediated homologous recombination.
Molecular therapy : the journal of the American Society of Gene Therapy
2020
Abstract
Recombinant AAV vectors have the unique ability to promote targeted integration of transgenes via homologous recombination at specified genomic sites reaching frequencies of 0.1-1%. We studied genomic parameters that influence targeting efficiencies on a large scale. To do this, we generated more than 1000 engineered, doxycycline-inducible target sites in the human HAP1 cell line and infected this polyclonal population with a library of AAV-DJ targeting vectors each carrying a unique barcode. The heterogeneity of barcode integration at each target site provided an assessment of targeting efficiency at that locus. We compared targeting efficiency with and without target site transcription for identical chromosomal positions. Targeting efficiency was enhanced by target site transcription, while chromatin accessibility was associated with an increased likelihood of targeting. ChromHMM chromatin states characterizing transcription and enhancers in wildtype K562 cells were also associated with increased AAV-HR efficiency with and without target site transcription, respectively. Furthermore, the amenability of a site to targeting was influenced by the endogenous transcriptional level of intersecting genes. These results define important parameters that may not only assist in designing optimal targeting vectors for genome editing, but also provide new insights into the mechanism of AAV-mediated homologous recombination.
View details for DOI 10.1016/j.ymthe.2020.11.025
View details for PubMedID 33248247
-
The impact of sex on gene expression across human tissues.
Science (New York, N.Y.)
2020; 369 (6509)
Abstract
Many complex human phenotypes exhibit sex-differentiated characteristics. However, the molecular mechanisms underlying these differences remain largely unknown. We generated a catalog of sex differences in gene expression and in the genetic regulation of gene expression across 44 human tissue sources surveyed by the Genotype-Tissue Expression project (GTEx, v8 release). We demonstrate that sex influences gene expression levels and cellular composition of tissue samples across the human body. A total of 37% of all genes exhibit sex-biased expression in at least one tissue. We identify cis expression quantitative trait loci (eQTLs) with sex-differentiated effects and characterize their cellular origin. By integrating sex-biased eQTLs with genome-wide association study data, we identify 58 gene-trait associations that are driven by genetic regulation of gene expression in a single sex. These findings provide an extensive characterization of sex differences in the human transcriptome and its genetic regulation.
View details for DOI 10.1126/science.aba3066
View details for PubMedID 32913072
-
Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx.
Genome biology
2020; 21 (1): 233
Abstract
Population structure among study subjects may confound genetic association studies, and lack of proper correction can lead to spurious findings. The Genotype-Tissue Expression (GTEx) project largely contains individuals of European ancestry, but the v8 release also includes up to 15% of individuals of non-European ancestry. Assessing ancestry-based adjustments in GTEx improves portability of this research across populations and further characterizes the impact of population structure on GWAS colocalization.Here, we identify a subset of 117 individuals in GTEx (v8) with a high degree of population admixture and estimate genome-wide local ancestry. We perform genome-wide cis-eQTL mapping using admixed samples in seven tissues, adjusted by either global or local ancestry. Consistent with previous work, we observe improved power with local ancestry adjustment. At loci where the two adjustments produce different lead variants, we observe 31 loci (0.02%) where a significant colocalization is called only with one eQTL ancestry adjustment method. Notably, both adjustments produce similar numbers of significant colocalizations within each of two different colocalization methods, COLOC and FINEMAP. Finally, we identify a small subset of eQTL-associated variants highly correlated with local ancestry, providing a resource to enhance functional follow-up.We provide a local ancestry map for admixed individuals in the GTEx v8 release and describe the impact of ancestry and admixture on gene expression, eQTLs, and GWAS colocalization. While the majority of the results are concordant between local and global ancestry-based adjustments, we identify distinct advantages and disadvantages to each approach.
View details for DOI 10.1186/s13059-020-02113-0
View details for PubMedID 32912333
-
Transcriptomic signatures across human tissues identify functional rare genetic variation.
Science (New York, N.Y.)
2020; 369 (6509)
Abstract
Rare genetic variants are abundant across the human genome, and identifying their function and phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes of RVs. We developed Watershed, a probabilistic model that integrates multiple genomic and transcriptomic signals to predict variant function, validated these predictions in additional cohorts and through experimental assays, and used them to assess RVs in the UK Biobank, the Million Veterans Program, and the Jackson Heart Study. Our results link thousands of RVs to diverse molecular effects and provide evidence to associate RVs affecting the transcriptome with human traits.
View details for DOI 10.1126/science.aaz5900
View details for PubMedID 32913073
-
FAM13A affects body fat distribution and adipocyte function.
Nature communications
2020; 11 (1): 1465
Abstract
Genetic variation in the FAM13A (Family with Sequence Similarity 13 Member A) locus has been associated with several glycemic and metabolic traits in genome-wide association studies (GWAS). Here, we demonstrate that in humans, FAM13A alleles are associated with increased FAM13A expression in subcutaneous adipose tissue (SAT) and an insulin resistance-related phenotype (e.g. higher waist-to-hip ratio and fasting insulin levels, but lower body fat). In human adipocyte models, knockdown of FAM13A in preadipocytes accelerates adipocyte differentiation. In mice, Fam13a knockout (KO) have a lower visceral to subcutaneous fat (VAT/SAT) ratio after high-fat diet challenge, in comparison to their wild-type counterparts. Subcutaneous adipocytes in KO mice show a size distribution shift toward an increased number of smaller adipocytes, along with an improved adipogenic potential. Our results indicate that GWAS-associated variants within the FAM13A locus alter adipose FAM13A expression, which in turn, regulates adipocyte differentiation and contribute to changes in body fat distribution.
View details for DOI 10.1038/s41467-020-15291-z
View details for PubMedID 32193374
-
A Bioinformatic Analysis of Integrative Mobile Genetic Elements Highlights Their Role in Bacterial Adaptation.
Cell host & microbe
2019
Abstract
Mobile genetic elements (MGEs) contribute to bacterial adaptation and evolution; however, high-throughput, unbiased MGE detection remains challenging. We describe MGEfinder, a bioinformatic toolbox that identifies integrative MGEs and their insertion sites by using short-read sequencing data. MGEfinder identifies the genomic site of each MGE insertion and infers the identity of the inserted sequence. We apply MGEfinder to 12,374 sequenced isolates of 9 prevalent bacterial pathogens, includingMycobacterium tuberculosis, Staphylococcus aureus, and Escherichia coli, and identify thousands of MGEs, including candidate insertion sequences, conjugative transposons, and prophage elements. The MGE repertoire and insertion rates vary across species, and integration sites often cluster near genes related to antibiotic resistance, virulence, and pathogenicity. MGE insertions likely contribute to antibiotic resistance in laboratory experiments and clinical isolates. Additionally, we identified thousands of mobility genes, a subset of which have unknown function opening avenues for exploration. Future application of MGEfinder to commensal bacteria will further illuminate bacterial adaptation and evolution.
View details for DOI 10.1016/j.chom.2019.10.022
View details for PubMedID 31862382
-
Genetic regulation of gene expression and splicing during a 10-year period of human aging.
Genome biology
2019; 20 (1): 230
Abstract
BACKGROUND: Molecular and cellular changes are intrinsic to aging and age-related diseases. Prior cross-sectional studies have investigated the combined effects of age and genetics on gene expression and alternative splicing; however, there has been no long-term, longitudinal characterization of these molecular changes, especially in older age.RESULTS: We perform RNA sequencing in whole blood from the same individuals at ages 70 and 80 to quantify how gene expression, alternative splicing, and their genetic regulation are altered during this 10-year period of advanced aging at a population and individual level. We observe that individuals are more similar to their own expression profiles later in life than profiles of other individuals their own age. We identify 1291 and 294 genes differentially expressed and alternatively spliced with age, as well as 529 genes with outlying individual trajectories. Further, we observe a strong correlation of genetic effects on expression and splicing between the two ages, with a small subset of tested genes showing a reduction in genetic associations with expression and splicing in older age.CONCLUSIONS: These findings demonstrate that, although the transcriptome and its genetic regulation is mostly stable late in life, a small subset of genes is dynamic and is characterized by a reduction in genetic regulation, most likely due to increasing environmental variance with age.
View details for DOI 10.1186/s13059-019-1840-y
View details for PubMedID 31684996
-
COMPREHENSIVE RNA ANALYSIS OF CEREBROSPINAL FLUID FROM LEPTOMENINGEAL METASTASES
OXFORD UNIV PRESS INC. 2019: 62
View details for Web of Science ID 000509478701108
-
Uganda Genome Resource Enables Insights into Population History and Genomic Discovery in Africa.
Cell
2019; 179 (4): 984
Abstract
Genomic studies in African populations provide unique opportunities to understand disease etiology, human diversity, and population history. In the largest study of its kind, comprising genome-wide data from 6,400 individuals and whole-genome sequences from 1,978 individuals from rural Uganda, we find evidence of geographically correlated fine-scale population substructure. Historically, the ancestry of modern Ugandans was best represented by a mixture of ancient East African pastoralists. We demonstrate the value of the largest sequence panel from Africa to date as an imputation resource. Examining 34 cardiometabolic traits, we show systematic differences in trait heritability between European and African populations, probably reflecting the differential impact of genes and environment. In a multi-trait pan-African GWAS of up to 14,126 individuals, we identify novel loci associated with anthropometric, hematological, lipid, and glycemic traits. We find that several functionally important signals are driven by Africa-specific variants, highlighting the value of studying diverse populations across the region.
View details for DOI 10.1016/j.cell.2019.10.004
View details for PubMedID 31675503
-
Atheroprotective roles of smooth muscle cell phenotypic modulation and the TCF21 disease gene as revealed by single-cell analysis.
Nature medicine
2019
Abstract
In response to various stimuli, vascular smooth muscle cells (SMCs) can de-differentiate, proliferate and migrate in a process known as phenotypic modulation. However, the phenotype of modulated SMCs in vivo during atherosclerosis and the influence of this process on coronary artery disease (CAD) risk have not been clearly established. Using single-cell RNA sequencing, we comprehensively characterized the transcriptomic phenotype of modulated SMCs in vivo in atherosclerotic lesions of both mouse and human arteries and found that these cells transform into unique fibroblast-like cells, termed 'fibromyocytes', rather than into a classical macrophage phenotype. SMC-specific knockout of TCF21-a causal CAD gene-markedly inhibited SMC phenotypic modulation in mice, leading to the presence of fewer fibromyocytes within lesions as well as within the protective fibrous cap of the lesions. Moreover, TCF21 expression was strongly associated with SMC phenotypic modulation in diseased human coronary arteries, and higher levels of TCF21 expression were associated with decreased CAD risk in human CAD-relevant tissues. These results establish a protective role for both TCF21 and SMC phenotypic modulation in this disease.
View details for DOI 10.1038/s41591-019-0512-5
View details for PubMedID 31359001
-
Identifying causal variants and genes using functional genomics in specialized cell types and contexts.
Human genetics
2019
Abstract
A central goal in human genetics is the identification of variants and genes that influence the risk of polygenic diseases. In the past decade, genome-wide association studies (GWAS) have identified tens of thousands of genetic loci associated with various diseases. Since the majority of such loci lie within non-coding regions and have many candidate variants in linkage disequilibrium, it has been challenging to accurately identify specific causal variants and genes. To aid in their discovery a variety of statistical and experimental approaches have been developed. These approaches often borrow information from functional genomics assays such as ATAC-seq, ChIP-seq and RNA-seq to annotate functional variants and identify regulatory relationships between variants and genes. While such approaches are powerful, given the diversity of cell types and environments, it is paramount to select disease-relevant contexts for follow-up analyses. In this review, we discuss the latest developments, challenges, and best practices for determining the causal mechanisms of polygenic disease risk variants with functional genomics data from specialized cell types.
View details for DOI 10.1007/s00439-019-02044-2
View details for PubMedID 31317254
-
Disease mechanisms elucidated by genetic regulation of human RPE gene expression
ASSOC RESEARCH VISION OPHTHALMOLOGY INC. 2019
View details for Web of Science ID 000488628104139
-
Genetic analyses of human fetal retinal pigment epithelium gene expression suggest ocular disease mechanisms.
Communications biology
2019; 2 (1): 186
Abstract
The retinal pigment epithelium (RPE) serves vital roles in ocular development and retinal homeostasis but has limited representation in large-scale functional genomics datasets. Understanding how common human genetic variants affect RPE gene expression could elucidate the sources of phenotypic variability in selected monogenic ocular diseases and pinpoint causal genes at genome-wide association study (GWAS) loci. We interrogated the genetics of gene expression of cultured human fetal RPE (fRPE) cells under two metabolic conditions and discovered hundreds of shared or condition-specific expression or splice quantitative trait loci (e/sQTLs). Co-localizations of fRPE e/sQTLs with age-related macular degeneration (AMD) and myopia GWAS data suggest new candidate genes, and mechanisms by which a common RDH5 allele contributes to both increased AMD risk and decreased myopia risk. Our study highlights the unique transcriptomic characteristics of fRPE and provides a resource to connect e/sQTLs in a critical ocular cell type to monogenic and complex eye disorders.
View details for DOI 10.1038/s42003-019-0430-6
View details for PubMedID 31925026
-
Abundant associations with gene expression complicate GWAS follow-up
NATURE GENETICS
2019; 51 (5): 768-+
View details for DOI 10.1038/s41588-019-0404-0
View details for Web of Science ID 000466842000002
-
Identification of 22 novel loci associated with urinary biomarkers of albumin, sodium, and potassium excretion
KIDNEY INTERNATIONAL
2019; 95 (5): 1197–1208
View details for DOI 10.1016/j.kint.2018.12.017
View details for Web of Science ID 000465213400023
-
Transcriptional and Position Effect Contributions to rAAV-Mediated Gene Targeting
CELL PRESS. 2019: 294
View details for Web of Science ID 000464381003086
-
Proficiency Testing of Standardized Samples Shows Very High Interlaboratory Agreement for Clinical Next-Generation Sequencing-Based Oncology Assays
ARCHIVES OF PATHOLOGY & LABORATORY MEDICINE
2019; 143 (4): 463–71
View details for DOI 10.5858/arpa.2018-0336-CP
View details for Web of Science ID 000462602800014
-
A toolkit for genetics providers in follow-up of patients with non-diagnostic exome sequencing
JOURNAL OF GENETIC COUNSELING
2019; 28 (2): 213–28
View details for DOI 10.1002/jgc4.1119
View details for Web of Science ID 000463993600005
-
Identification of 22 novel loci associated withurinary biomarkers of albumin, sodium, andpotassium excretion.
Kidney international
2019
Abstract
Urine biomarkers reflecting kidney function and handling of dietary sodium and potassium are strongly associated with several common diseases including chronic kidney disease, cardiovascular disease, and diabetes mellitus. Knowledge about the genetic determinants of these biomarkers may shed light on pathophysiological mechanisms underlying the development of these diseases. We performed genome-wide association studies of urinary albumin: creatinine ratio (UACR), urinary potassium: creatinine ratio (UK/UCr), urinary sodium: creatinine ratio (UNa/UCr) and urinary sodium: potassium ratio (UNa/UK) in up to 218,450 (discovery) and 109,166 (replication) unrelated individuals of European ancestry from the UK Biobank. Further, we explored genetic correlations, tissue-specific gene expression, and possible genes implicated in the regulation of these biomarkers. After replication, we identified 19 genome-wide significant independent loci associated with UACR, 6 each with UK/UCr and UNa/UCr, and 4 with UNa/UK. In addition to 22 novel associations, we confirmed several established associations, including between the CUBN locus and microalbuminuria. We detected high pairwise genetic correlation across the urinary biomarkers, and between their levels and several physiological measurements. We highlight GIPR, a potential diabetes drug target, as possibly implicated in the genetic control of urinary potassium excretion, and NRBP1, a locus associated with gout, as plausibly involved in sodium and albumin excretion. Overall, we identified 22 novel genome-wide significant associations with urinary biomarkers and confirmed several previously established associations, providing new insights into the genetic basis of these traits and their connection to chronic diseases.
View details for PubMedID 30910378
-
Abundant associations with gene expression complicate GWAS follow-up.
Nature genetics
2019; 51 (5): 768–69
View details for PubMedID 31043754
-
SEX DIFFERENCES AT THE MOLECULAR LEVEL: LESSONS FROM THE HUMAN TRANSCRIPTOME
ELSEVIER. 2019: 1034
View details for DOI 10.1016/j.euroneuro.2018.07.028
View details for Web of Science ID 000477708400028
-
A toolkit for genetics providers in follow-up of patients with non-diagnostic exome sequencing.
Journal of genetic counseling
2019; 28 (2): 213–28
Abstract
There are approximately 7,000 rare diseases affecting 25-30 million Americans, with 80% estimated to have a genetic basis. This presents a challenge for genetics practitioners to determine appropriate testing, make accurate diagnoses, and conduct up-to-date patient management. Exome sequencing (ES) is a comprehensive diagnostic approach, but only 25%-41% of the patients receive a molecular diagnosis. The remaining three-fifths to three-quarters of patients undergoing ES remain undiagnosed. The Stanford Center for Undiagnosed Diseases (CUD), a clinical site of the Undiagnosed Diseases Network, evaluates patients with undiagnosed and rare diseases using a combination of methods including ES. Frequently these patients have non-diagnostic ES results, but strategic follow-up techniques identify diagnoses in a subset. We present techniques used at the CUD that can be adopted by genetics providers in clinical follow-up of cases where ES is non-diagnostic. Solved case examples illustrate different types of non-diagnostic results and the additional techniques that led to a diagnosis. Frequent approaches include segregation analysis, data reanalysis, genome sequencing, additional variant identification, careful phenotype-disease correlation, confirmatory testing, and case matching. We also discuss prioritization of cases for additional analyses.
View details for PubMedID 30964584
-
Genetic analyses of human fetal retinal pigment epithelium gene expression suggest ocular disease mechanisms.
Communications biology
2019; 2: 186
Abstract
The retinal pigment epithelium (RPE) serves vital roles in ocular development and retinal homeostasis but has limited representation in large-scale functional genomics datasets. Understanding how common human genetic variants affect RPE gene expression could elucidate the sources of phenotypic variability in selected monogenic ocular diseases and pinpoint causal genes at genome-wide association study (GWAS) loci. We interrogated the genetics of gene expression of cultured human fetal RPE (fRPE) cells under two metabolic conditions and discovered hundreds of shared or condition-specific expression or splice quantitative trait loci (e/sQTLs). Co-localizations of fRPE e/sQTLs with age-related macular degeneration (AMD) and myopia GWAS data suggest new candidate genes, and mechanisms by which a common RDH5 allele contributes to both increased AMD risk and decreased myopia risk. Our study highlights the unique transcriptomic characteristics of fRPE and provides a resource to connect e/sQTLs in a critical ocular cell type to monogenic and complex eye disorders.
View details for DOI 10.1038/s42003-019-0430-6
View details for PubMedID 31123710
-
Pathologic gene network rewiring implicates PPP1R3A as a central regulator in pressure overload heart failure.
Nature communications
2019; 10 (1): 2760
Abstract
Heart failure is a leading cause of mortality, yet our understanding of the genetic interactions underlying this disease remains incomplete. Here, we harvest 1352 healthy and failing human hearts directly from transplant center operating rooms, and obtain genome-wide genotyping and gene expression measurements for a subset of 313. We build failing and non-failing cardiac regulatory gene networks, revealing important regulators and cardiac expression quantitative trait loci (eQTLs). PPP1R3A emerges as a regulator whose network connectivity changes significantly between health and disease. RNA sequencing after PPP1R3A knockdown validates network-based predictions, and highlights metabolic pathway regulation associated with increased cardiomyocyte size and perturbed respiratory metabolism. Mice lacking PPP1R3A are protected against pressure-overload heart failure. We present a global gene interaction map of the human heart failure transition, identify previously unreported cardiac eQTLs, and demonstrate the discovery potential of disease-specific networks through the description of PPP1R3A as a central regulator in heart failure.
View details for DOI 10.1038/s41467-019-10591-5
View details for PubMedID 31235787
-
Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts.
Nature medicine
2019
Abstract
It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.
View details for DOI 10.1038/s41591-019-0457-8
View details for PubMedID 31160820
-
Diagnosing rare diseases after the exome.
Cold Spring Harbor molecular case studies
2018; 4 (6)
Abstract
High-throughput sequencing has ushered in a diversity of approaches for identifying genetic variants and understanding genome structure and function. When applied to individuals with rare genetic diseases, these approaches have greatly accelerated gene discovery and patient diagnosis. Over the past decade, exome sequencing has emerged as a comprehensive and cost-effective approach to identify pathogenic variants in the protein-coding regions of the genome. However, for individuals in whom exome-sequencing fails to identify a pathogenic variant, we discuss recent advances that are helping to reduce the diagnostic gap.
View details for PubMedID 30559314
-
Proficiency Testing of Standardized Samples Shows Very High Interlaboratory Agreement for Clinical Next-Generation Sequencing-Based Oncology Assays.
Archives of pathology & laboratory medicine
2018
Abstract
CONTEXT.: Next-generation sequencing-based assays are being increasingly used in the clinical setting for the detection of somatic variants in solid tumors, but limited data are available regarding the interlaboratory performance of these assays.OBJECTIVE.: To examine proficiency testing data from the initial College of American Pathologists (CAP) Next-Generation Sequencing Solid Tumor survey to report on laboratory performance.DESIGN.: CAP proficiency testing results from 111 laboratories were analyzed for accuracy and associated assay performance characteristics.RESULTS.: The overall accuracy observed for all variants was 98.3%. Rare false-negative results could not be attributed to sequencing platform, selection method, or other assay characteristics. The median and average of the variant allele fractions reported by the laboratories were within 10% of those orthogonally determined by digital polymerase chain reaction for each variant. The median coverage reported at the variant sites ranged from 1922 to 3297.CONCLUSIONS.: Laboratories demonstrated an overall accuracy of greater than 98% with high specificity when examining 10 clinically relevant somatic single-nucleotide variants with a variant allele fraction of 15% or greater. These initial data suggest excellent performance, but further ongoing studies are needed to evaluate the performance of lower variant allele fractions and additional variant types.
View details for PubMedID 30376374
-
Large-Scale Phenome-Wide Association Study of PCSK9 Variants Demonstrates Protection Against Ischemic Stroke
CIRCULATION-GENOMIC AND PRECISION MEDICINE
2018; 11 (7): e002162
Abstract
PCSK9 inhibition is a potent new therapy for hypercholesterolemia and cardiovascular disease. Although short-term clinical trial results have not demonstrated major adverse effects, long-term data will not be available for some time. Genetic studies in large biobanks offer a unique opportunity to predict drug effects and provide context for the evaluation of future clinical trial outcomes.We tested the association of the PCSK9 missense variant rs11591147 with predefined phenotypes and phenome-wide, in 337 536 individuals of British ancestry in the UK Biobank, with independent discovery and replication. Using a Bayesian statistical method, we leveraged phenotype correlations to evaluate the phenome-wide impact of PCSK9 inhibition with higher power at a finer resolution.The T allele of rs11591147 showed a protective effect on hyperlipidemia (odds ratio, 0.63±0.04; P=2.32×10-38), coronary heart disease (odds ratio, 0.73±0.09; P=1.05×10-6), and ischemic stroke (odds ratio, 0.61±0.18; P=2.40×10-3) and was associated with increased type 2 diabetes mellitus risk adjusted for lipid-lowering medication status (odds ratio, 1.24±0.10; P=1.98×10-7). We did not observe associations with cataracts, heart failure, atrial fibrillation, and cognitive dysfunction. Leveraging phenotype correlations, we observed evidence of a protective association with cerebral infarction and vascular occlusion. These results explore the effects of direct PCSK9 inhibition; off-target effects cannot be predicted using this approach.This result represents the first genetic evidence in a large cohort for the protective effect of PCSK9 inhibition on ischemic stroke and corroborates exploratory evidence from clinical trials. PCSK9 inhibition was not associated with variables other than those related to LDL (low-density lipoprotein) cholesterol, atherosclerosis, and type 2 diabetes mellitus, suggesting that other effects are either small or absent.
View details for PubMedID 29997226
-
Ubiquitination of ABCE1 by NOT4 in Response to Mitochondrial Damage Links Co-translational Quality Control to PINK1-Directed Mitophagy.
Cell metabolism
2018
Abstract
Translation of mRNAs is tightly regulated and constantly surveyed for errors. Aberrant translation can trigger co-translational protein and RNA quality control processes, impairments of which cause neurodegeneration by still poorly understood mechanism(s). Here we show that quality control of translation of mitochondrial outer membrane (MOM)-localized mRNA intersects with the turnover of damaged mitochondria, both orchestrated by the mitochondrial kinase PINK1. Mitochondrial damage causes stalled translation of complex-I 30 kDa subunit (C-I30) mRNA on MOM, triggering the recruitment of co-translational quality control factors Pelo, ABCE1, and NOT4 to the ribosome/mRNA-ribonucleoprotein complex. Damage-induced ubiquitination of ABCE1 by NOT4 generates poly-ubiquitin signals that attract autophagy receptors to MOM to initiate mitophagy. In the Drosophila PINK1 model, these factors act synergistically to restore mitophagy and neuromuscular tissue integrity. Thus ribosome-associated co-translational quality control generates an early signal to trigger mitophagy. Our results have broad therapeutic implications for the understanding and treatment of neurodegenerative diseases.
View details for PubMedID 29861391
-
Recurrently Mutated Genes Differ between Leptomeningeal and Solid Lung Cancer Brain Metastases.
Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer
2018
Abstract
When compared with solid brain metastases from NSCLC, leptomeningeal disease (LMD) has unique growth patterns and is rapidly fatal. Patients with LMD do not undergo surgical resection, limiting the tissue available for scientific research. In this study we performed whole exome sequencing on eight samples of LMD to identify somatic mutations and compared the results with those for 26 solid brain metastases. We found that taste 2 receptor member 31 gene (TAS2R31) and phosphodiesterase 4D interacting protein gene (PDE4DIP) were recurrently mutated among LMD samples, suggesting involvement in LMD progression. Together with a retrospective review of the charts of an additional 44 patients with NSCLC LMD, we discovered a surprisingly low number of KRAS mutations (n= 4 [7.7%]) but a high number of EGFR mutations (n= 33 [63.5%]). The median interval for development of LMD from NSCLC was shorter in patients with mutant EGFR (16.3 months) than in patients with wild-type EGFR (23.9 months) (p= 0.017). Targeted analysis of recurrent mutations thus presents a useful complement to the existing diagnostic tool kit, and correlations of EGFR in LMD and KRAS in solid metastases suggest that molecular distinctions or systemic treatment pressure underpin the differences in growth patterns within the brain.
View details for PubMedID 29604399
-
Biallelic Mutations in ATP5F1D, which Encodes a Subunit of ATP Synthase, Cause a Metabolic Disorder
AMERICAN JOURNAL OF HUMAN GENETICS
2018; 102 (3): 494–504
Abstract
ATP synthase, H+ transporting, mitochondrial F1 complex, δ subunit (ATP5F1D; formerly ATP5D) is a subunit of mitochondrial ATP synthase and plays an important role in coupling proton translocation and ATP production. Here, we describe two individuals, each with homozygous missense variants in ATP5F1D, who presented with episodic lethargy, metabolic acidosis, 3-methylglutaconic aciduria, and hyperammonemia. Subject 1, homozygous for c.245C>T (p.Pro82Leu), presented with recurrent metabolic decompensation starting in the neonatal period, and subject 2, homozygous for c.317T>G (p.Val106Gly), presented with acute encephalopathy in childhood. Cultured skin fibroblasts from these individuals exhibited impaired assembly of F1FO ATP synthase and subsequent reduced complex V activity. Cells from subject 1 also exhibited a significant decrease in mitochondrial cristae. Knockdown of Drosophila ATPsynδ, the ATP5F1D homolog, in developing eyes and brains caused a near complete loss of the fly head, a phenotype that was fully rescued by wild-type human ATP5F1D. In contrast, expression of the ATP5F1D c.245C>T and c.317T>G variants rescued the head-size phenotype but recapitulated the eye and antennae defects seen in other genetic models of mitochondrial oxidative phosphorylation deficiency. Our data establish c.245C>T (p.Pro82Leu) and c.317T>G (p.Val106Gly) in ATP5F1D as pathogenic variants leading to a Mendelian mitochondrial disease featuring episodic metabolic decompensation.
View details for PubMedID 29478781
-
Genetic Regulatory Mechanisms of Smooth Muscle Cells Map to Coronary Artery Disease Risk Loci.
American journal of human genetics
2018
Abstract
Coronary artery disease (CAD) is the leading cause of death globally. Genome-wide association studies (GWASs) have identified more than 95 independent loci that influence CAD risk, most of which reside in non-coding regions of the genome. To interpret these loci, we generated transcriptome and whole-genome datasets using human coronary artery smooth muscle cells (HCASMCs) from 52 unrelated donors, as well as epigenomic datasets using ATAC-seq on a subset of 8 donors. Through systematic comparison with publicly available datasets from GTEx and ENCODE projects, we identified transcriptomic, epigenetic, and genetic regulatory mechanisms specific to HCASMCs. We assessed the relevance of HCASMCs to CAD risk using transcriptomic and epigenomic level analyses. By jointly modeling eQTL and GWAS datasets, we identified five genes (SIPA1, TCF21, SMAD3, FES, and PDGFRA) that may modulate CAD risk through HCASMCs, all of which have relevant functional roles in vascular remodeling. Comparison with GTEx data suggests that SIPA1 and PDGFRA influence CAD risk predominantly through HCASMCs, while other annotated genes may have multiple cell and tissue targets. Together, these results provide tissue-specific and mechanistic insights into the regulation of a critical vascular cell type associated with CAD in human populations.
View details for PubMedID 30146127
-
Functional regulatory mechanism of smooth muscle cell-restricted LMOD1 coronary artery disease locus.
PLoS genetics
2018; 14 (11): e1007755
Abstract
Recent genome-wide association studies (GWAS) have identified multiple new loci which appear to alter coronary artery disease (CAD) risk via arterial wall-specific mechanisms. One of the annotated genes encodes LMOD1 (Leiomodin 1), a member of the actin filament nucleator family that is highly enriched in smooth muscle-containing tissues such as the artery wall. However, it is still unknown whether LMOD1 is the causal gene at this locus and also how the associated variants alter LMOD1 expression/function and CAD risk. Using epigenomic profiling we recently identified a non-coding regulatory variant, rs34091558, which is in tight linkage disequilibrium (LD) with the lead CAD GWAS variant, rs2820315. Herein we demonstrate through expression quantitative trait loci (eQTL) and statistical fine-mapping in GTEx, STARNET, and human coronary artery smooth muscle cell (HCASMC) datasets, rs34091558 is the top regulatory variant for LMOD1 in vascular tissues. Position weight matrix (PWM) analyses identify the protective allele rs34091558-TA to form a conserved Forkhead box O3 (FOXO3) binding motif, which is disrupted by the risk allele rs34091558-A. FOXO3 chromatin immunoprecipitation and reporter assays show reduced FOXO3 binding and LMOD1 transcriptional activity by the risk allele, consistent with effects of FOXO3 downregulation on LMOD1. LMOD1 knockdown results in increased proliferation and migration and decreased cell contraction in HCASMC, and immunostaining in atherosclerotic lesions in the SMC lineage tracing reporter mouse support a key role for LMOD1 in maintaining the differentiated SMC phenotype. These results provide compelling functional evidence that genetic variation is associated with dysregulated LMOD1 expression/function in SMCs, together contributing to the heritable risk for CAD.
View details for PubMedID 30444878
-
Allele-specific expression reveals interactions between genetic variation and environment.
Nature methods
2017
Abstract
Identifying interactions between genetics and the environment (GxE) remains challenging. We have developed EAGLE, a hierarchical Bayesian model for identifying GxE interactions based on associations between environmental variables and allele-specific expression. Combining whole-blood RNA-seq with extensive environmental annotations collected from 922 human individuals, we identified 35 GxE interactions, compared with only four using standard GxE interaction testing. EAGLE provides new opportunities for researchers to identify GxE interactions using functional genomic data.
View details for DOI 10.1038/nmeth.4298
View details for PubMedID 28530654
-
Population- and individual- specific regulatory variation in Sardinia
NATURE GENETICS
2017; 49 (5): 700-?
Abstract
Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.
View details for DOI 10.1038/ng.3840
View details for Web of Science ID 000400051400010
View details for PubMedID 28394350
-
The impact of structural variation on human gene expression
NATURE GENETICS
2017; 49 (5): 692-?
Abstract
Structural variants (SVs) are an important source of human genetic diversity, but their contribution to traits, disease and gene regulation remains unclear. We mapped cis expression quantitative trait loci (eQTLs) in 13 tissues via joint analysis of SVs, single-nucleotide variants (SNVs) and short insertion/deletion (indel) variants from deep whole-genome sequencing (WGS). We estimated that SVs are causal at 3.5-6.8% of eQTLs-a substantially higher fraction than prior estimates-and that expression-altering SVs have larger effect sizes than do SNVs and indels. We identified 789 putative causal SVs predicted to directly alter gene expression: most (88.3%) were noncoding variants enriched at enhancers and other regulatory elements, and 52 were linked to genome-wide association study loci. We observed a notable abundance of rare high-impact SVs associated with aberrant expression of nearby genes. These results suggest that comprehensive WGS-based SV analyses will increase the power of common- and rare-variant association studies.
View details for DOI 10.1038/ng.3834
View details for PubMedID 28369037
-
Overexpression of the Cytokine BAFF and Autoimmunity Risk
NEW ENGLAND JOURNAL OF MEDICINE
2017; 376 (17): 1615-1626
Abstract
Genomewide association studies of autoimmune diseases have mapped hundreds of susceptibility regions in the genome. However, only for a few association signals has the causal gene been identified, and for even fewer have the causal variant and underlying mechanism been defined. Coincident associations of DNA variants affecting both the risk of autoimmune disease and quantitative immune variables provide an informative route to explore disease mechanisms and drug-targetable pathways.Using case-control samples from Sardinia, Italy, we performed a genomewide association study in multiple sclerosis followed by TNFSF13B locus-specific association testing in systemic lupus erythematosus (SLE). Extensive phenotyping of quantitative immune variables, sequence-based fine mapping, cross-population and cross-phenotype analyses, and gene-expression studies were used to identify the causal variant and elucidate its mechanism of action. Signatures of positive selection were also investigated.A variant in TNFSF13B, encoding the cytokine and drug target B-cell activating factor (BAFF), was associated with multiple sclerosis as well as SLE. The disease-risk allele was also associated with up-regulated humoral immunity through increased levels of soluble BAFF, B lymphocytes, and immunoglobulins. The causal variant was identified: an insertion-deletion variant, GCTGT→A (in which A is the risk allele), yielded a shorter transcript that escaped microRNA inhibition and increased production of soluble BAFF, which in turn up-regulated humoral immunity. Population genetic signatures indicated that this autoimmunity variant has been evolutionarily advantageous, most likely by augmenting resistance to malaria.A TNFSF13B variant was associated with multiple sclerosis and SLE, and its effects were clarified at the population, cellular, and molecular levels. (Funded by the Italian Foundation for Multiple Sclerosis and others.).
View details for DOI 10.1056/NEJMoa1610528
View details for Web of Science ID 000400071900005
-
PML nuclear bodies contribute to the basal expression of the mTOR inhibitor DDIT4
SCIENTIFIC REPORTS
2017; 7
Abstract
The promyelocytic leukemia (PML) protein is an essential component of PML nuclear bodies (PML NBs) frequently lost in cancer. PML NBs coordinate chromosomal regions via modification of nuclear proteins that in turn may regulate genes in the vicinity of these bodies. However, few PML NB-associated genes have been identified. PML and PML NBs can also regulate mTOR and cell fate decisions in response to cellular stresses. We now demonstrate that PML depletion in U2OS cells or TERT-immortalized normal human diploid fibroblasts results in decreased expression of the mTOR inhibitor DDIT4 (REDD1). DNA and RNA immuno-FISH reveal that PML NBs are closely associated with actively transcribed DDIT4 loci, implicating these bodies in regulation of basal DDIT4 expression. Although PML silencing did reduce the sensitivity of U2OS cells to metabolic stress induced by metformin, PML loss did not inhibit the upregulation of DDIT4 in response to metformin, hypoxia-like (CoCl2) or genotoxic stress. Analysis of publicly available cancer data also revealed a significant correlation between PML and DDIT4 expression in several cancer types (e.g. lung, breast, prostate). Thus, these findings uncover a novel mechanism by which PML loss may contribute to mTOR activation and cancer progression via dysregulation of basal DDIT4 gene expression.
View details for DOI 10.1038/srep45038
View details for Web of Science ID 000397135000001
View details for PubMedID 28332630
-
Whole transcriptome sequencing in blood provides a diagnosis of spinal muscular atrophy with progressive myoclonic epilepsy (SMA-PME).
Human mutation
2017
Abstract
At least 15% of the disease-causing mutations affect mRNA splicing. Many splicing mutations are missed in a clinical setting due to limitations of in silico prediction algorithms or their location in noncoding regions. Whole-transcriptome sequencing is a promising new tool to identify these mutations; however, it will be a challenge to obtain disease-relevant tissue for RNA. Here, we describe an individual with a sporadic atypical spinal muscular atrophy, in whom clinical DNA sequencing reported one pathogenic ASAH1 mutation (c.458A>G;p.Tyr153Cys). Transcriptome sequencing on patient leukocytes identified a highly significant and atypical ASAH1 isoform not explained by c.458A>G(p<10(-16) ). Subsequent Sanger-sequencing identified the splice mutation responsible for the isoform (c.504A>C;p.Lys168Asn) and provided a molecular diagnosis of autosomal-recessive spinal muscular atrophy with progressive myoclonic epilepsy. Our findings demonstrate the utility of RNA sequencing from blood to identify splice-impacting disease mutations for nonhematological conditions, providing a diagnosis for these otherwise unsolved patients.
View details for DOI 10.1002/humu.23211
View details for PubMedID 28251733
-
Small RNA Sequencing in Cells and Exosomes Identifies eQTLs and 14q32 as a Region of Active Export
G3-GENES GENOMES GENETICS
2017; 7 (1): 31-39
Abstract
Exosomes are small extracellular vesicles that carry heterogeneous cargo, including RNA, between cells. Increasing evidence suggests that exosomes are important mediators of intercellular communication and biomarkers of disease. Despite this, the variability of exosomal RNA between individuals has not been well quantified. To assess this variability, we sequenced the small RNA of cells and exosomes from a 17-member family. Across individuals, we show that selective export of miRNAs occurs not only at the level of specific transcripts, but that a cluster of 74 mature miRNAs on chromosome 14q32 is massively exported in exosomes while mostly absent from cells. We also observe more interindividual variability between exosomal samples than between cellular ones and identify four miRNA expression quantitative trait loci shared between cells and exosomes. Our findings indicate that genomically colocated miRNAs can be exported together and highlight the variability in exosomal miRNA levels between individuals as relevant for exosome use as diagnostics.
View details for DOI 10.1534/g3.116.036137
View details for Web of Science ID 000392200800003
View details for PubMedCentralID PMC5217120
-
FIRE: functional inference of genetic variants that regulate gene expression.
Bioinformatics (Oxford, England)
2017; 33 (24): 3895–3901
Abstract
Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal genome analysis. One mechanism by which noncoding single nucleotide variants (SNVs) influence downstream phenotypes is through the regulation of gene expression. Methods to predict whether or not individual SNVs are likely to regulate gene expression would aid interpretation of variants of unknown significance identified in whole-genome sequencing studies.We developed FIRE (Functional Inference of Regulators of Expression), a tool to score both noncoding and coding SNVs based on their potential to regulate the expression levels of nearby genes. FIRE consists of 23 random forests trained to recognize SNVs in cis-expression quantitative trait loci (cis-eQTLs) using a set of 92 genomic annotations as predictive features. FIRE scores discriminate cis-eQTL SNVs from non-eQTL SNVs in the training set with a cross-validated area under the receiver operating characteristic curve (AUC) of 0.807, and discriminate cis-eQTL SNVs shared across six populations of different ancestry from non-eQTL SNVs with an AUC of 0.939. FIRE scores are also predictive of cis-eQTL SNVs across a variety of tissue types.FIRE scores for genome-wide SNVs in hg19/GRCh37 are available for download at https://sites.google.com/site/fireregulatoryvariation/.nilah@stanford.edu.Supplementary data are available at Bioinformatics online.
View details for PubMedID 28961785
-
Long-read genome sequencing identifies causal structural variation in a Mendelian disease.
Genetics in medicine : official journal of the American College of Medical Genetics
2017
Abstract
PurposeCurrent clinical genomics assays primarily utilize short-read sequencing (SRS), but SRS has limited ability to evaluate repetitive regions and structural variants. Long-read sequencing (LRS) has complementary strengths, and we aimed to determine whether LRS could offer a means to identify overlooked genetic variation in patients undiagnosed by SRS.MethodsWe performed low-coverage genome LRS to identify structural variants in a patient who presented with multiple neoplasia and cardiac myxomata, in whom the results of targeted clinical testing and genome SRS were negative.ResultsThis LRS approach yielded 6,971 deletions and 6,821 insertions > 50 bp. Filtering for variants that are absent in an unrelated control and overlap a disease gene coding exon identified three deletions and three insertions. One of these, a heterozygous 2,184 bp deletion, overlaps the first coding exon of PRKAR1A, which is implicated in autosomal dominant Carney complex. RNA sequencing demonstrated decreased PRKAR1A expression. The deletion was classified as pathogenic based on guidelines for interpretation of sequence variants.ConclusionThis first successful application of genome LRS to identify a pathogenic variant in a patient suggests that LRS has significant potential for the identification of disease-causing structural variation. Larger studies will ultimately be required to evaluate the potential clinical utility of LRS.GENETICS in MEDICINE advance online publication, 22 June 2017; doi:10.1038/gim.2017.86.
View details for PubMedID 28640241
-
Overexpression of the Cytokine BAFF and Autoimmunity Risk.
New England journal of medicine
2017; 376 (17): 1615-1626
Abstract
Genomewide association studies of autoimmune diseases have mapped hundreds of susceptibility regions in the genome. However, only for a few association signals has the causal gene been identified, and for even fewer have the causal variant and underlying mechanism been defined. Coincident associations of DNA variants affecting both the risk of autoimmune disease and quantitative immune variables provide an informative route to explore disease mechanisms and drug-targetable pathways.Using case-control samples from Sardinia, Italy, we performed a genomewide association study in multiple sclerosis followed by TNFSF13B locus-specific association testing in systemic lupus erythematosus (SLE). Extensive phenotyping of quantitative immune variables, sequence-based fine mapping, cross-population and cross-phenotype analyses, and gene-expression studies were used to identify the causal variant and elucidate its mechanism of action. Signatures of positive selection were also investigated.A variant in TNFSF13B, encoding the cytokine and drug target B-cell activating factor (BAFF), was associated with multiple sclerosis as well as SLE. The disease-risk allele was also associated with up-regulated humoral immunity through increased levels of soluble BAFF, B lymphocytes, and immunoglobulins. The causal variant was identified: an insertion-deletion variant, GCTGT→A (in which A is the risk allele), yielded a shorter transcript that escaped microRNA inhibition and increased production of soluble BAFF, which in turn up-regulated humoral immunity. Population genetic signatures indicated that this autoimmunity variant has been evolutionarily advantageous, most likely by augmenting resistance to malaria.A TNFSF13B variant was associated with multiple sclerosis and SLE, and its effects were clarified at the population, cellular, and molecular levels. (Funded by the Italian Foundation for Multiple Sclerosis and others.).
View details for DOI 10.1056/NEJMoa1610528
View details for PubMedID 28445677
-
Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans.
Genome medicine
2017; 9 (1): 98
Abstract
Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects.Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals.We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations.Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.
View details for PubMedID 29178968
-
Incorporation of Biological Knowledge Into the Study of Gene-Environment Interactions.
American journal of epidemiology
2017; 186 (7): 771–77
Abstract
A growing knowledge base of genetic and environmental information has greatly enabled the study of disease risk factors. However, the computational complexity and statistical burden of testing all variants by all environments has required novel study designs and hypothesis-driven approaches. We discuss how incorporating biological knowledge from model organisms, functional genomics, and integrative approaches can empower the discovery of novel gene-environment interactions and discuss specific methodological considerations with each approach. We consider specific examples where the application of these approaches has uncovered effects of gene-environment interactions relevant to drug response and immunity, and we highlight how such improvements enable a greater understanding of the pathogenesis of disease and the realization of precision medicine.
View details for DOI 10.1093/aje/kwx229
View details for PubMedID 28978191
-
Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases.
American journal of epidemiology
2017; 186 (7): 753–61
Abstract
Recently, many new approaches, study designs, and statistical and analytical methods have emerged for studying gene-environment interactions (G×Es) in large-scale studies of human populations. There are opportunities in this field, particularly with respect to the incorporation of -omics and next-generation sequencing data and continual improvement in measures of environmental exposures implicated in complex disease outcomes. In a workshop called "Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases," held October 17-18, 2014, by the National Institute of Environmental Health Sciences and the National Cancer Institute in conjunction with the annual American Society of Human Genetics meeting, participants explored new approaches and tools that have been developed in recent years for G×E discovery. This paper highlights current and critical issues and themes in G×E research that need additional consideration, including the improved data analytical methods, environmental exposure assessment, and incorporation of functional data and annotations.
View details for DOI 10.1093/aje/kwx227
View details for PubMedID 28978193
-
Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease.
Nature genetics
2017; 49 (12): 1664–70
Abstract
Genetic variants have been associated with myriad molecular phenotypes that provide new insight into the range of mechanisms underlying genetic traits and diseases. Identifying any particular genetic variant's cascade of effects, from molecule to individual, requires assaying multiple layers of molecular complexity. We introduce the Enhancing GTEx (eGTEx) project that extends the GTEx project to combine gene expression with additional intermediate molecular measurements on the same tissues to provide a resource for studying how genetic differences cascade through molecular phenotypes to impact human health.
View details for DOI 10.1038/ng.3969
View details for PubMedID 29019975
-
The impact of rare variation on gene expression across tissues.
Nature
2017; 550 (7675): 239–43
Abstract
Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
View details for PubMedID 29022581
-
Genetic effects on gene expression across human tissues.
Nature
2017; 550 (7675): 204–13
Abstract
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
View details for PubMedID 29022597
-
A TNFRSF14-Fc epsilon RI-mast cell pathway contributes to development of multiple features of asthma pathology in mice
NATURE COMMUNICATIONS
2016; 7
Abstract
Asthma has multiple features, including airway hyperreactivity, inflammation and remodelling. The TNF superfamily member TNFSF14 (LIGHT), via interactions with the receptor TNFRSF14 (HVEM), can support TH2 cell generation and longevity and promote airway remodelling in mouse models of asthma, but the mechanisms by which TNFSF14 functions in this setting are incompletely understood. Here we find that mouse and human mast cells (MCs) express TNFRSF14 and that TNFSF14:TNFRSF14 interactions can enhance IgE-mediated MC signalling and mediator production. In mouse models of asthma, TNFRSF14 blockade with a neutralizing antibody administered after antigen sensitization, or genetic deletion of Tnfrsf14, diminishes plasma levels of antigen-specific IgG1 and IgE antibodies, airway hyperreactivity, airway inflammation and airway remodelling. Finally, by analysing two types of genetically MC-deficient mice after engrafting MCs that either do or do not express TNFRSF14, we show that TNFRSF14 expression on MCs significantly contributes to the development of multiple features of asthma pathology.
View details for DOI 10.1038/ncomms13696
View details for Web of Science ID 000389853400001
View details for PubMedID 27982078
View details for PubMedCentralID PMC5171877
-
Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells.
Nature methods
2016
Abstract
Engineering and study of protein function by directed evolution has been limited by the technical requirement to use global mutagenesis or introduce DNA libraries. Here, we develop CRISPR-X, a strategy to repurpose the somatic hypermutation machinery for protein engineering in situ. Using catalytically inactive dCas9 to recruit variants of cytidine deaminase (AID) with MS2-modified sgRNAs, we can specifically mutagenize endogenous targets with limited off-target damage. This generates diverse libraries of localized point mutations and can target multiple genomic locations simultaneously. We mutagenize GFP and select for spectrum-shifted variants, including EGFP. Additionally, we mutate the target of the cancer therapeutic bortezomib, PSMB5, and identify known and novel mutations that confer bortezomib resistance. Finally, using a hyperactive AID variant, we mutagenize loci both upstream and downstream of transcriptional start sites. These experiments illustrate a powerful approach to create complex libraries of genetic variants in native context, which is broadly applicable to investigate and improve protein function.
View details for DOI 10.1038/nmeth.4038
View details for PubMedID 27798611
-
Small RNA Sequencing in Cells and Exosomes Identifies eQTLs and 14q32 as a Region of Active Export.
G3 (Bethesda, Md.)
2016
Abstract
Exosomes are small extracellular vesicles that carry heterogeneous cargo, including RNA, between cells. Increasing evidence suggests that exosomes are important mediators of intercellular communication and biomarkers of disease. Despite this, the variability of exosomal RNA between individuals has not been well quantified. To assess this variability, we sequenced the small RNA of cells and exosomes from a 17-member family. Across individuals, we show that selective export of miRNAs occurs not only at the level of specific transcripts, but that a cluster of 74 mature miRNAs on chromosome 14q32 is massively exported in exosomes while mostly absent from cells. We also observe more interindividual variability between exosomal samples than between cellular ones and identify four miRNA expression quantitative trait loci shared between cells and exosomes. Our findings indicate that genomically colocated miRNAs can be exported together and highlight the variability in exosomal miRNA levels between individuals as relevant for exosome use as diagnostics.
View details for DOI 10.1534/g3.116.036137
View details for PubMedID 27799337
View details for PubMedCentralID PMC5217120
-
DNA Methylation Profiling of Uniparental Disomy Subjects Provides a Map of Parental Epigenetic Bias in the Human Genome.
American journal of human genetics
2016; 99 (3): 555-566
Abstract
Genomic imprinting is a mechanism in which gene expression varies depending on parental origin. Imprinting occurs through differential epigenetic marks on the two parental alleles, with most imprinted loci marked by the presence of differentially methylated regions (DMRs). To identify sites of parental epigenetic bias, here we have profiled DNA methylation patterns in a cohort of 57 individuals with uniparental disomy (UPD) for 19 different chromosomes, defining imprinted DMRs as sites where the maternal and paternal methylation levels diverge significantly from the biparental mean. Using this approach we identified 77 DMRs, including nearly all those described in previous studies, in addition to 34 DMRs not previously reported. These include a DMR at TUBGCP5 within the recurrent 15q11.2 microdeletion region, suggesting potential parent-of-origin effects associated with this genomic disorder. We also observed a modest parental bias in DNA methylation levels at every CpG analyzed across ∼1.9 Mb of the 15q11-q13 Prader-Willi/Angelman syndrome region, demonstrating that the influence of imprinting is not limited to individual regulatory elements such as CpG islands, but can extend across entire chromosomal domains. Using RNA-seq data, we detected signatures consistent with imprinted expression associated with nine novel DMRs. Finally, using a population sample of 4,004 blood methylomes, we define patterns of epigenetic variation at DMRs, identifying rare individuals with global gain or loss of methylation across multiple imprinted loci. Our data provide a detailed map of parental epigenetic bias in the human genome, providing insights into potential parent-of-origin effects.
View details for DOI 10.1016/j.ajhg.2016.06.032
View details for PubMedID 27569549
-
Impact of the X Chromosome and sex on regulatory variation
GENOME RESEARCH
2016; 26 (6): 768-777
Abstract
The X Chromosome, with its unique mode of inheritance, contributes to differences between the sexes at a molecular level, including sex-specific gene expression and sex-specific impact of genetic variation. Improving our understanding of these differences offers to elucidate the molecular mechanisms underlying sex-specific traits and diseases. However, to date, most studies have either ignored the X Chromosome or had insufficient power to test for the sex-specific impact of genetic variation. By analyzing whole blood transcriptomes of 922 individuals, we have conducted the first large-scale, genome-wide analysis of the impact of both sex and genetic variation on patterns of gene expression, including comparison between the X Chromosome and autosomes. We identified a depletion of expression quantitative trait loci (eQTL) on the X Chromosome, especially among genes under high selective constraint. In contrast, we discovered an enrichment of sex-specific regulatory variants on the X Chromosome. To resolve the molecular mechanisms underlying such effects, we generated chromatin accessibility data through ATAC-sequencing to connect sex-specific chromatin accessibility to sex-specific patterns of expression and regulatory variation. As sex-specific regulatory variants discovered in our study can inform sex differences in heritable disease prevalence, we integrated our data with genome-wide association study data for multiple immune traits identifying several traits with significant sex biases in genetic susceptibilities. Together, our study provides genome-wide insight into how genetic variation, the X Chromosome, and sex shape human gene regulation and disease.
View details for DOI 10.1101/gr.197897.115
View details for PubMedID 27197214
-
An Efficient Multiple-Testing Adjustment for eQTL Studies that Accounts for Linkage Disequilibrium between Variants
AMERICAN JOURNAL OF HUMAN GENETICS
2016; 98 (1): 216-224
View details for DOI 10.1016/j.ajhg.2015.11.021
View details for Web of Science ID 000368050800016
-
An Efficient Multiple-Testing Adjustment for eQTL Studies that Accounts for Linkage Disequilibrium between Variants.
American journal of human genetics
2016; 98 (1): 216-24
Abstract
Methods for multiple-testing correction in local expression quantitative trait locus (cis-eQTL) studies are a trade-off between statistical power and computational efficiency. Bonferroni correction, though computationally trivial, is overly conservative and fails to account for linkage disequilibrium between variants. Permutation-based methods are more powerful, though computationally far more intensive. We present an alternative correction method called eigenMT, which runs over 500 times faster than permutations and has adjusted p values that closely approximate empirical ones. To achieve this speed while also maintaining the accuracy of permutation-based methods, we estimate the effective number of independent variants tested for association with a particular gene, termed Meff, by using the eigenvalue decomposition of the genotype correlation matrix. We employ a regularized estimator of the correlation matrix to ensure Meff is robust and yields adjusted p values that closely approximate p values from permutations. Finally, using a common genotype matrix, we show that eigenMT can be applied with even greater efficiency to studies across tissues or conditions. Our method provides a simpler, more efficient approach to multiple-testing correction than existing methods and fits within existing pipelines for eQTL discovery.
View details for DOI 10.1016/j.ajhg.2015.11.021
View details for PubMedID 26749306
View details for PubMedCentralID PMC4716687
-
ORegAnno 3.0: a community-driven resource for curated regulatory annotation.
Nucleic acids research
2016; 44 (D1): D126-32
Abstract
The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/.
View details for DOI 10.1093/nar/gkv1203
View details for PubMedID 26578589
View details for PubMedCentralID PMC4702855
-
Integrative functional genomics identifies regulatory mechanisms at coronary artery disease loci.
Nature communications
2016; 7: 12092-?
Abstract
Coronary artery disease (CAD) is the leading cause of mortality and morbidity, driven by both genetic and environmental risk factors. Meta-analyses of genome-wide association studies have identified >150 loci associated with CAD and myocardial infarction susceptibility in humans. A majority of these variants reside in non-coding regions and are co-inherited with hundreds of candidate regulatory variants, presenting a challenge to elucidate their functions. Herein, we use integrative genomic, epigenomic and transcriptomic profiling of perturbed human coronary artery smooth muscle cells and tissues to begin to identify causal regulatory variation and mechanisms responsible for CAD associations. Using these genome-wide maps, we prioritize 64 candidate variants and perform allele-specific binding and expression analyses at seven top candidate loci: 9p21.3, SMAD3, PDGFD, IL6R, BMP1, CCDC97/TGFB1 and LMOD1. We validate our findings in expression quantitative trait loci cohorts, which together reveal new links between CAD associations and regulatory function in the appropriate disease context.
View details for DOI 10.1038/ncomms12092
View details for PubMedID 27386823
-
Non-Coding Loss-of-Function Variation in Human Genomes
HUMAN HEREDITY
2016; 81 (2): 78-87
Abstract
Whole-genome and exome sequencing in human populations has revealed the tolerance of each gene for loss-of-function variation. By understanding this tolerance, it has become increasingly possible to identify genes that would make safe therapeutic targets and to identify rare genetic risk factors and phenotypes at the scale of individual genomes. To date, the vast majority of surveyed loss-of-function variants are in protein-coding regions of the genome mainly due to the focus on these regions by exome-based sequencing projects and their relative ease of interpretability. As whole-genome sequencing becomes more prevalent, new strategies will be required to uncover impactful variation in non-coding regions of the genome where the architecture of genome function is more complex. In this review, we investigate recent studies of loss-of-function variation and emerging approaches for interpreting whole-genome sequencing data to identify rare and impactful non-coding loss-of-function variants.
View details for DOI 10.1159/000447453
View details for Web of Science ID 000392559600029
View details for PubMedID 28076858
-
A global reference for human genetic variation
NATURE
2015; 526 (7571): 68-?
Abstract
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
View details for DOI 10.1038/nature15393
View details for Web of Science ID 000362095100036
-
The landscape of genomic imprinting across diverse adult human tissues
GENOME RESEARCH
2015; 25 (7): 927-936
Abstract
Genomic imprinting is an important regulatory mechanism that silences one of the parental copies of a gene. To systematically characterize this phenomenon, we analyze tissue specificity of imprinting from allelic expression data in 1582 primary tissue samples from 178 individuals from the Genotype-Tissue Expression (GTEx) project. We characterize imprinting in 42 genes, including both novel and previously identified genes. Tissue specificity of imprinting is widespread, and gender-specific effects are revealed in a small number of genes in muscle with stronger imprinting in males. IGF2 shows maternal expression in the brain instead of the canonical paternal expression elsewhere. Imprinting appears to have only a subtle impact on tissue-specific expression levels, with genes lacking a systematic expression difference between tissues with imprinted and biallelic expression. In summary, our systematic characterization of imprinting in adult tissues highlights variation in imprinting between genes, individuals, and tissues.
View details for DOI 10.1101/gr.192278.115
View details for Web of Science ID 000357356900001
View details for PubMedID 25953952
View details for PubMedCentralID PMC4484390
-
Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome.
Science
2015; 348 (6235): 666-669
Abstract
Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.
View details for DOI 10.1126/science.1261877
View details for PubMedID 25954003
-
Effect of predicted protein-truncating genetic variants on the human transcriptome
SCIENCE
2015; 348 (6235): 666-669
Abstract
Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.
View details for DOI 10.1126/science.1261877
View details for Web of Science ID 000354045700038
View details for PubMedCentralID PMC4537935
-
Genetic conflict reflected in tissue-specific maps of genomic imprinting in human and mouse.
Nature genetics
2015; 47 (5): 544-549
Abstract
Genomic imprinting is an epigenetic process that restricts gene expression to either the maternally or paternally inherited allele. Many theories have been proposed to explain its evolutionary origin, but understanding has been limited by a paucity of data mapping the breadth and dynamics of imprinting within any organism. We generated an atlas of imprinting spanning 33 mouse and 45 human developmental stages and tissues. Nearly all imprinted genes were imprinted in early development and either retained their parent-of-origin expression in adults or lost it completely. Consistent with an evolutionary signature of parental conflict, imprinted genes were enriched for coexpressed pairs of maternally and paternally expressed genes, showed accelerated expression divergence between human and mouse, and were more highly expressed than their non-imprinted orthologs in other species. Our approach demonstrates a general framework for the discovery of imprinting in any species and sheds light on the causes and consequences of genomic imprinting in mammals.
View details for DOI 10.1038/ng.3274
View details for PubMedID 25848752
-
Genetic conflict reflected in tissue-specific maps of genomic imprinting in human and mouse.
Nature genetics
2015; 47 (5): 544-549
Abstract
Genomic imprinting is an epigenetic process that restricts gene expression to either the maternally or paternally inherited allele. Many theories have been proposed to explain its evolutionary origin, but understanding has been limited by a paucity of data mapping the breadth and dynamics of imprinting within any organism. We generated an atlas of imprinting spanning 33 mouse and 45 human developmental stages and tissues. Nearly all imprinted genes were imprinted in early development and either retained their parent-of-origin expression in adults or lost it completely. Consistent with an evolutionary signature of parental conflict, imprinted genes were enriched for coexpressed pairs of maternally and paternally expressed genes, showed accelerated expression divergence between human and mouse, and were more highly expressed than their non-imprinted orthologs in other species. Our approach demonstrates a general framework for the discovery of imprinting in any species and sheds light on the causes and consequences of genomic imprinting in mammals.
View details for DOI 10.1038/ng.3274
View details for PubMedID 25848752
View details for PubMedCentralID PMC4414907
-
Tissue-specific effects of genetic and epigenetic variation on gene regulation and splicing.
PLoS genetics
2015; 11 (1)
Abstract
Understanding how genetic variation affects distinct cellular phenotypes, such as gene expression levels, alternative splicing and DNA methylation levels, is essential for better understanding of complex diseases and traits. Furthermore, how inter-individual variation of DNA methylation is associated to gene expression is just starting to be studied. In this study, we use the GenCord cohort of 204 newborn Europeans' lymphoblastoid cell lines, T-cells and fibroblasts derived from umbilical cords. The samples were previously genotyped for 2.5 million SNPs, mRNA-sequenced, and assayed for methylation levels in 482,421 CpG sites. We observe that methylation sites associated to expression levels are enriched in enhancers, gene bodies and CpG island shores. We show that while the correlation between DNA methylation and gene expression can be positive or negative, it is very consistent across cell-types. However, this epigenetic association to gene expression appears more tissue-specific than the genetic effects on gene expression or DNA methylation (observed in both sharing estimations based on P-values and effect size correlations between cell-types). This predominance of genetic effects can also be reflected by the observation that allele specific expression differences between individuals dominate over tissue-specific effects. Additionally, we discover genetic effects on alternative splicing and interestingly, a large amount of DNA methylation correlating to alternative splicing, both in a tissue-specific manner. The locations of the SNPs and methylation sites involved in these associations highlight the participation of promoter proximal and distant regulatory regions on alternative splicing. Overall, our results provide high-resolution analyses showing how genome sequence variation has a broad effect on cellular phenotypes across cell-types, whereas epigenetic factors provide a secondary layer of variation that is more tissue-specific. Furthermore, the details of how this tissue-specificity may vary across inter-relations of molecular traits, and where these are occurring, can yield further insights into gene regulation and cellular biology as a whole.
View details for DOI 10.1371/journal.pgen.1004958
View details for PubMedID 25634236
View details for PubMedCentralID PMC4310612
-
RNA Sequencing and Analysis.
Cold Spring Harbor protocols
2015; 2015 (11): pdb top084970-?
Abstract
RNA sequencing (RNA-Seq) uses the capabilities of high-throughput sequencing methods to provide insight into the transcriptome of a cell. Compared to previous Sanger sequencing- and microarray-based methods, RNA-Seq provides far higher coverage and greater resolution of the dynamic nature of the transcriptome. Beyond quantifying gene expression, the data generated by RNA-Seq facilitate the discovery of novel transcripts, identification of alternatively spliced genes, and detection of allele-specific expression. Recent advances in the RNA-Seq workflow, from sample preparation to library construction to data analysis, have enabled researchers to further elucidate the functional complexity of the transcription. In addition to polyadenylated messenger RNA (mRNA) transcripts, RNA-Seq can be applied to investigate different populations of RNA, including total RNA, pre-mRNA, and noncoding RNA, such as microRNA and long ncRNA. This article provides an introduction to RNA-Seq methods, including applications, experimental design, and technical challenges.
View details for DOI 10.1101/pdb.top084970
View details for PubMedID 25870306
-
Type I interferon signaling genes in recurrent major depression: increased expression detected by whole-blood RNA sequencing.
Molecular psychiatry
2014; 19 (12): 1267-1274
Abstract
A study of genome-wide gene expression in major depressive disorder (MDD) was undertaken in a large population-based sample to determine whether altered expression levels of genes and pathways could provide insights into biological mechanisms that are relevant to this disorder. Gene expression studies have the potential to detect changes that may be because of differences in common or rare genomic sequence variation, environmental factors or their interaction. We recruited a European ancestry sample of 463 individuals with recurrent MDD and 459 controls, obtained self-report and semi-structured interview data about psychiatric and medical history and other environmental variables, sequenced RNA from whole blood and genotyped a genome-wide panel of common single-nucleotide polymorphisms. We used analytical methods to identify MDD-related genes and pathways using all of these sources of information. In analyses of association between MDD and expression levels of 13 857 single autosomal genes, accounting for multiple technical, physiological and environmental covariates, a significant excess of low P-values was observed, but there was no significant single-gene association after genome-wide correction. Pathway-based analyses of expression data detected significant association of MDD with increased expression of genes in the interferon α/β signaling pathway. This finding could not be explained by potentially confounding diseases and medications (including antidepressants) or by computationally estimated proportions of white blood cell types. Although cause-effect relationships cannot be determined from these data, the results support the hypothesis that altered immune signaling has a role in the pathogenesis, manifestation, and/or the persistence and progression of MDD.Molecular Psychiatry advance online publication, 3 December 2013; doi:10.1038/mp.2013.161.
View details for DOI 10.1038/mp.2013.161
View details for PubMedID 24296977
-
Type I interferon signaling genes in recurrent major depression: increased expression detected by whole-blood RNA sequencing
MOLECULAR PSYCHIATRY
2014; 19 (12): 1267-1274
View details for DOI 10.1038/mp.2013.161
View details for Web of Science ID 000345423500004
-
High-Resolution Transcriptome Analysis with Long-Read RNA Sequencing
PLOS ONE
2014; 9 (9)
Abstract
RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals.
View details for DOI 10.1371/journal.pone.0108095
View details for Web of Science ID 000342492700076
View details for PubMedCentralID PMC4176000
-
Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants.
American journal of human genetics
2014; 95 (3): 245-256
Abstract
Recent and rapid human population growth has led to an excess of rare genetic variants that are expected to contribute to an individual's genetic burden of disease risk. To date, much of the focus has been on rare protein-coding variants, for which potential impact can be estimated from the genetic code, but determining the impact of rare noncoding variants has been more challenging. To improve our understanding of such variants, we combined high-quality genome sequencing and RNA sequencing data from a 17-individual, three-generation family to contrast expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) within this family to eQTLs and sQTLs within a population sample. Using this design, we found that eQTLs and sQTLs with large effects in the family were enriched with rare regulatory and splicing variants (minor allele frequency < 0.01). They were also more likely to influence essential genes and genes involved in complex disease. In addition, we tested the capacity of diverse noncoding annotation to predict the impact of rare noncoding variants. We found that distance to the transcription start site, evolutionary constraint, and epigenetic annotation were considerably more informative for predicting the impact of rare variants than for predicting the impact of common variants. These results highlight that rare noncoding variants are important contributors to individual gene-expression profiles and further demonstrate a significant capability for genomic annotation to predict the impact of rare noncoding variants.
View details for DOI 10.1016/j.ajhg.2014.08.004
View details for PubMedID 25192044
View details for PubMedCentralID PMC4157143
-
Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture.
PLoS genetics
2014; 10 (8)
Abstract
Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP). The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and regulatory genetics across populations from the broadest points of human migration history yet sampled.
View details for DOI 10.1371/journal.pgen.1004549
View details for PubMedID 25121757
-
Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture.
PLoS genetics
2014; 10 (8)
View details for DOI 10.1371/journal.pgen.1004549
View details for PubMedID 25121757
-
Cis and trans effects of human genomic variants on gene expression.
PLoS genetics
2014; 10 (7)
Abstract
Gene expression is a heritable cellular phenotype that defines the function of a cell and can lead to diseases in case of misregulation. In order to detect genetic variations affecting gene expression, we performed association analysis of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) with gene expression measured in 869 lymphoblastoid cell lines of the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort in cis and in trans. We discovered that 3,534 genes (false discovery rate (FDR) = 5%) are affected by an expression quantitative trait locus (eQTL) in cis and 48 genes are affected in trans. We observed that CNVs are more likely to be eQTLs than SNPs. In addition, we found that variants associated to complex traits and diseases are enriched for trans-eQTLs and that trans-eQTLs are enriched for cis-eQTLs. As a variant affecting both a gene in cis and in trans suggests that the cis gene is functionally linked to the trans gene expression, we looked specifically for trans effects of cis-eQTLs. We discovered that 26 cis-eQTLs are associated to 92 genes in trans with the cis-eQTLs of the transcriptions factors BATF3 and HMX2 affecting the most genes. We then explored if the variation of the level of expression of the cis genes were causally affecting the level of expression of the trans genes and discovered several causal relationships between variation in the level of expression of the cis gene and variation of the level of expression of the trans gene. This analysis shows that a large sample size allows the discovery of secondary effects of human variations on gene expression that can be used to construct short directed gene regulatory networks.
View details for DOI 10.1371/journal.pgen.1004461
View details for PubMedID 25010687
View details for PubMedCentralID PMC4091791
-
Cis and trans effects of human genomic variants on gene expression.
PLoS genetics
2014; 10 (7): e1004461
Abstract
Gene expression is a heritable cellular phenotype that defines the function of a cell and can lead to diseases in case of misregulation. In order to detect genetic variations affecting gene expression, we performed association analysis of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) with gene expression measured in 869 lymphoblastoid cell lines of the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort in cis and in trans. We discovered that 3,534 genes (false discovery rate (FDR) = 5%) are affected by an expression quantitative trait locus (eQTL) in cis and 48 genes are affected in trans. We observed that CNVs are more likely to be eQTLs than SNPs. In addition, we found that variants associated to complex traits and diseases are enriched for trans-eQTLs and that trans-eQTLs are enriched for cis-eQTLs. As a variant affecting both a gene in cis and in trans suggests that the cis gene is functionally linked to the trans gene expression, we looked specifically for trans effects of cis-eQTLs. We discovered that 26 cis-eQTLs are associated to 92 genes in trans with the cis-eQTLs of the transcriptions factors BATF3 and HMX2 affecting the most genes. We then explored if the variation of the level of expression of the cis genes were causally affecting the level of expression of the trans genes and discovered several causal relationships between variation in the level of expression of the cis gene and variation of the level of expression of the trans gene. This analysis shows that a large sample size allows the discovery of secondary effects of human variations on gene expression that can be used to construct short directed gene regulatory networks.
View details for DOI 10.1371/journal.pgen.1004461
View details for PubMedID 25010687
View details for PubMedCentralID PMC4091791
-
Determining causality and consequence of expression quantitative trait loci
HUMAN GENETICS
2014; 133 (6): 727-735
Abstract
Expression quantitative trait loci (eQTLs) are currently the most abundant and systematically-surveyed class of functional consequence for genetic variation. Recent genetic studies of gene expression have identified thousands of eQTLs in diverse tissue types for the majority of human genes. Application of this large eQTL catalog provides an important resource for understanding the molecular basis of common genetic diseases. However, only now has both the availability of individuals with full genomes and corresponding advances in functional genomics provided the opportunity to dissect eQTLs to identify causal regulatory variants. Resolving the properties of such causal regulatory variants is improving understanding of the molecular mechanisms that influence traits and guiding the development of new genome-scale approaches to variant interpretation. In this review, we provide an overview of current computational and experimental methods for identifying causal regulatory variants and predicting their phenotypic consequences.
View details for DOI 10.1007/s00439-014-1446-0
View details for Web of Science ID 000336317000005
View details for PubMedID 24770875
-
Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues.
PLoS genetics
2014; 10 (5)
Abstract
Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants.
View details for DOI 10.1371/journal.pgen.1004304
View details for PubMedID 24786518
-
Dissecting the causal genetic mechanisms of coronary heart disease.
Current atherosclerosis reports
2014; 16 (5): 406-?
Abstract
Large-scale genome-wide association studies (GWAS) have identified 46 loci that are associated with coronary heart disease (CHD). Additionally, 104 independent candidate variants (false discovery rate of 5 %) have been identified (Schunkert H, Konig IR, Kathiresan S, Reilly MP, Assimes TL, Holm H et al. Nat Genet 43:333-8, 2011; Deloukas P, Kanoni S, Willenborg C, Farrall M, Assimes TL, Thompson JR et al. Nat Genet 45:25-33, 2012; C4D Genetics Consortium. Nat Genet 43:339-44, 2011). The majority of the causal genes in these loci function independently of conventional risk factors. It is postulated that a number of the CHD-associated genes regulate basic processes in the vascular cells involved in atherosclerosis, and that study of the signaling pathways that are modulated in this cell type by causal regulatory variation will provide critical new insights for targeting the initiation and progression of disease. In this review, we will discuss the types of experimental approaches and data that are critical to understanding the molecular processes that underlie the disease risk at 9p21.3, TCF21, SORT1, and other CHD-associated loci.
View details for DOI 10.1007/s11883-014-0406-4
View details for PubMedID 24623178
-
SplicePlot: a utility for visualizing splicing quantitative trait loci.
Bioinformatics
2014; 30 (7): 1025-1026
Abstract
RNA-Sequencing has provided unprecedented resolution of alternative splicing and splicing-quantitative trait loci (sQTL). However, there are few tools available for visualizing the genotype-dependent effects of splicing at a population level. SplicePlot is a simple command line utility that produces intuitive visualization of sQTLs and their effects. SplicePlot takes mapped RNA-seq reads in BAM format and genotype data in VCF format as input and outputs publication quality sashimi plots, hive plots, and structure plots enabling better investigation and understanding of the role of genetics on alternative splicing and transcript structure.Availability and Implementation: Source code and detailed documentation are available at http://montgomerylab.stanford.edu/spliceplot/index.html under Resources and at Github. SplicePlot is implemented in Python and is supported on Linux and Mac OS. A VirtualBox virtual machine running Ubuntu with SplicePlot already installed is also available.wu.eric.g@gmail.com or smontgom@stanford.edu.
View details for DOI 10.1093/bioinformatics/btt733
View details for PubMedID 24363378
-
Path-scan: a reporting tool for identifying clinically actionable variants.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2014; 19: 229-240
Abstract
The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.
View details for PubMedID 24297550
-
Transcriptome analysis reveals differential splicing events in IPF lung tissue.
PloS one
2014; 9 (5)
Abstract
Idiopathic pulmonary fibrosis (IPF) is a complex disease in which a multitude of proteins and networks are disrupted. Interrogation of the transcriptome through RNA sequencing (RNA-Seq) enables the determination of genes whose differential expression is most significant in IPF, as well as the detection of alternative splicing events which are not easily observed with traditional microarray experiments. We sequenced messenger RNA from 8 IPF lung samples and 7 healthy controls on an Illumina HiSeq 2000, and found evidence for substantial differential gene expression and differential splicing. 873 genes were differentially expressed in IPF (FDR<5%), and 440 unique genes had significant differential splicing events in at least one exonic region (FDR<5%). We used qPCR to validate the differential exon usage in the second and third most significant exonic regions, in the genes COL6A3 (RNA-Seq adjusted pval = 7.18e-10) and POSTN (RNA-Seq adjusted pval = 2.06e-09), which encode the extracellular matrix proteins collagen alpha-3(VI) and periostin. The increased gene-level expression of periostin has been associated with IPF and its clinical progression, but its differential splicing has not been studied in the context of this disease. Our results suggest that alternative splicing of these and other genes may be involved in the pathogenesis of IPF. We have developed an interactive web application which allows users to explore the results of our RNA-Seq experiment, as well as those of two previously published microarray experiments, and we hope that this will serve as a resource for future investigations of gene regulation in IPF.
View details for DOI 10.1371/journal.pone.0097550
View details for PubMedID 24805851
-
High-resolution transcriptome analysis with long-read RNA sequencing.
PloS one
2014; 9 (9)
Abstract
RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals.
View details for DOI 10.1371/journal.pone.0108095
View details for PubMedID 25251678
-
Transcriptome Analysis Reveals Differential Splicing Events in IPF Lung Tissue.
PloS one
2014; 9 (3): e92111
Abstract
Idiopathic pulmonary fibrosis (IPF) is a complex disease in which a multitude of proteins and networks are disrupted. Interrogation of the transcriptome through RNA sequencing (RNA-Seq) enables the determination of genes whose differential expression is most significant in IPF, as well as the detection of alternative splicing events which are not easily observed with traditional microarray experiments. We sequenced messenger RNA from 8 IPF lung samples and 7 healthy controls on an Illumina HiSeq 2000, and found evidence for substantial differential gene expression and differential splicing. 873 genes were differentially expressed in IPF (FDR<5%), and 440 unique genes had significant differential splicing events in at least one exonic region (FDR<5%). We used qPCR to validate the differential exon usage in the second and third most significant exonic regions, in the genes COL6A3 (RNA-Seq adjusted pval = 7.18e-10) and POSTN (RNA-Seq adjusted pval = 2.06e-09), which encode the extracellular matrix proteins collagen alpha-3(VI) and periostin. The increased gene-level expression of periostin has been associated with IPF and its clinical progression, but its differential splicing has not been studied in the context of this disease. Our results suggest that alternative splicing of these and other genes may be involved in the pathogenesis of IPF. We have developed an interactive web application which allows users to explore the results of our RNA-Seq experiment, as well as those of two previously published microarray experiments, and we hope that this will serve as a resource for future investigations of gene regulation in IPF.
View details for DOI 10.1371/journal.pone.0092111
View details for PubMedID 24647608
View details for PubMedCentralID PMC3960165
-
Quantifying RNA allelic ratios by microfluidic multiplex PCR and sequencing.
Nature methods
2014; 11 (1): 51-54
Abstract
We developed a targeted RNA sequencing method that couples microfluidics-based multiplex PCR and deep sequencing (mmPCR-seq) to uniformly and simultaneously amplify up to 960 loci in 48 samples independently of their gene expression levels and to accurately and cost-effectively measure allelic ratios even for low-quantity or low-quality RNA samples. We applied mmPCR-seq to RNA editing and allele-specific expression studies. mmPCR-seq complements RNA-seq for studying allelic variations in the transcriptome.
View details for DOI 10.1038/nmeth.2736
View details for PubMedID 24270603
View details for PubMedCentralID PMC3877737
-
Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals
GENOME RESEARCH
2014; 24 (1): 14-24
Abstract
Understanding the consequences of regulatory variation in the human genome remains a major challenge, with important implications for understanding gene regulation and interpreting the many disease-risk variants that fall outside of protein-coding regions. Here, we provide a direct window into the regulatory consequences of genetic variation by sequencing RNA from 922 genotyped individuals. We present a comprehensive description of the distribution of regulatory variation-by the specific expression phenotypes altered, the properties of affected genes, and the genomic characteristics of regulatory variants. We detect variants influencing expression of over ten thousand genes, and through the enhanced resolution offered by RNA-sequencing, for the first time we identify thousands of variants associated with specific phenotypes including splicing and allelic expression. Evaluating the effects of both long-range intra-chromosomal and trans (cross-chromosomal) regulation, we observe modularity in the regulatory network, with three-dimensional chromosomal configuration playing a particular role in regulatory modules within each chromosome. We also observe a significant depletion of regulatory variants affecting central and critical genes, along with a trend of reduced effect sizes as variant frequency increases, providing evidence that purifying selection and buffering have limited the deleterious impact of regulatory variation on the cell. Further, generalizing beyond observed variants, we have analyzed the genomic properties of variants associated with expression and splicing and developed a Bayesian model to predict regulatory consequences of genetic variants, applicable to the interpretation of individual genomes and disease studies. Together, these results represent a critical step toward characterizing the complete landscape of human regulatory variation.
View details for DOI 10.1101/gr.155192.113
View details for PubMedID 24092820
-
Performance of genomic medicine.
Genome biology
2013; 14 (12): 316
Abstract
A report on the Cold Spring Harbor Laboratory meeting on Precision Medicine: Personal Genomes and Pharmacogenomics, held in Cold Spring Harbor, New York, USA, November 13-16, 2013.
View details for DOI 10.1186/gb4146
View details for PubMedID 24359965
-
Transcriptome and genome sequencing uncovers functional variation in humans.
Nature
2013; 501 (7468): 506-511
Abstract
Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.
View details for DOI 10.1038/nature12531
View details for PubMedID 24037378
-
Transcriptome and genome sequencing uncovers functional variation in humans
NATURE
2013; 501 (7468): 506-511
View details for DOI 10.1038/nature12531
View details for Web of Science ID 000324826300049
-
Systematic functional regulatory assessment of disease-associated variants.
Proceedings of the National Academy of Sciences of the United States of America
2013; 110 (23): 9607-9612
Abstract
Genome-wide association studies have discovered many genetic loci associated with disease traits, but the functional molecular basis of these associations is often unresolved. Genome-wide regulatory and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation and may allow for functional assessment of disease-associated loci. Here, we present a unique approach for systematic integration of genetic disease associations, transcription factor binding among individuals, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In an analysis of genome-wide binding profiles of NFκB, we find that disease-associated SNPs are enriched in NFκB binding regions overall, and specifically for inflammatory-mediated diseases, such as asthma, rheumatoid arthritis, and coronary artery disease. Using genome-wide variation in transcription factor-binding data, we find that NFκB binding is often correlated with disease-associated variants in a genotype-specific and allele-specific manner. Furthermore, we show that this binding variation is often related to expression of nearby genes, which are also found to have altered expression in independent profiling of the variant-associated disease condition. Thus, using this integrative approach, we provide a unique means to assign putative function to many disease-associated SNPs.
View details for DOI 10.1073/pnas.1219099110
View details for PubMedID 23690573
-
Desktop transcriptome sequencing from archival tissue to identify clinically relevant translocations.
American journal of surgical pathology
2013; 37 (6): 796-803
Abstract
Somatic mutations, often translocations or single nucleotide variations, are pathognomonic for certain types of cancers and are increasingly of clinical importance for diagnosis and prediction of response to therapy. Conventional clinical assays only evaluate 1 mutation at a time, and targeted tests are often constrained to identify only the most common mutations. Genome-wide or transcriptome-wide high-throughput sequencing (HTS) of clinical samples offers an opportunity to evaluate for all clinically significant mutations with a single test. Recently a "desktop version" of HTS has become available, but most of the experience to date is based on data obtained from high-quality DNA from frozen specimens. In this study, we demonstrate, as a proof of principle, that translocations in sarcomas can be diagnosed from formalin-fixed paraffin-embedded (FFPE) tissue with desktop HTS. Using the first generation MiSeq platform, full transcriptome sequencing was performed on FFPE material from archival blocks of 3 synovial sarcomas, 3 myxoid liposarcomas, 2 Ewing sarcomas, and 1 clear cell sarcoma. Mapping the reads to the "sarcomatome" (all known 83 genes involved in translocations and mutations in sarcoma) and using a novel algorithm for ranking fusion candidates, the pathognomonic fusions and the exact breakpoints were identified in all cases of synovial sarcoma, myxoid liposarcoma, and clear cell sarcoma. The Ewing sarcoma fusion gene was detectable in FFPE material only with a sequencing platform that generates greater sequencing depth. The results show that a single transcriptome HTS assay, from FFPE, has the potential to replace conventional molecular diagnostic techniques for the evaluation of clinically relevant mutations in cancer.
View details for DOI 10.1097/PAS.0b013e31827ad9b2
View details for PubMedID 23598961
-
The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes.
Genome research
2013; 23 (5): 749-761
Abstract
Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.
View details for DOI 10.1101/gr.148718.112
View details for PubMedID 23478400
View details for PubMedCentralID PMC3638132
-
Examination of the relationship between variation at 17q21 and childhood wheeze phenotypes
JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY
2013; 131 (3): 685-694
Abstract
Genome-wide association studies have identified associations of genetic variants at 17q21 near ORMDL3 with childhood asthma.We sought to determine whether associations in this region are specific to particular asthma phenotypes and specific to ORMDL3.We examined associations between 244 independent single nucleotide polymorphisms (SNPs) plus 13 previously identified asthma-related SNPs in the region between 34 and 36 Mb on chromosome 17 and early wheezing phenotypes, doctor-diagnosed asthma and atopy at 7½ years, and bronchial hyperresponsiveness and lung function at 8½ years in 7045 children from the Avon Longitudinal Study of Parents and Children birth cohort study. With this, cis expression quantitative trait loci signals for the same SNPs were assessed in 875 samples across genes in the same region.The strongest evidence for phenotypic association was seen for persistent wheezing (rs8076131 near ORMDL3: relative risk ratio [RRR], 1.60 [95% CI, 1.40-1.84], P = 1.4 × 10(-11); rs2305480 near GSDML: RRR, 1.60 [95% CI, 1.39-1.83], P = 1.5 × 10(-11); and rs9303277 near IKZF3: RRR, 1.57 [95% CI, 1.37-1.79], P = 4.4 × 10(-11)). Similar but less precisely estimated effects were seen for intermediate-onset wheeze, but there was little evidence of associations with other wheezing phenotypes. There was some evidence of associations with bronchial hyperresponsiveness. SNPs across the whole region show strong evidence of association with differential levels of expression at GSDML, IKZF3, and MED24, as well as ORMDL3.Associations of SNPs in the 17q21 locus are specific to asthma and specific wheezing phenotypes and are not explained by associations with intermediate phenotypes, such as atopy or lung function.
View details for DOI 10.1016/j.jaci.2012.09.021
View details for Web of Science ID 000315587800008
View details for PubMedID 23154084
-
Integrating GWAS and Expression Data for Functional Characterization of Disease-Associated SNPs: An Application to Follicular Lymphoma
AMERICAN JOURNAL OF HUMAN GENETICS
2013; 92 (1): 126-130
Abstract
Development of post-GWAS (genome-wide association study) methods are greatly needed for characterizing the function of trait-associated SNPs. Strategies integrating various biological data sets with GWAS results will provide insights into the mechanistic role of associated SNPs. Here, we present a method that integrates RNA sequencing (RNA-seq) and allele-specific expression data with GWAS data to further characterize SNPs associated with follicular lymphoma (FL). We investigated the influence on gene expression of three established FL-associated loci-rs10484561, rs2647012, and rs6457327-by measuring their correlation with human-leukocyte-antigen (HLA) expression levels obtained from publicly available RNA-seq expression data sets from lymphoblastoid cell lines. Our results suggest that SNPs linked to the protective variant rs2647012 exert their effect by a cis-regulatory mechanism involving modulation of HLA-DQB1 expression. In contrast, no effect on HLA expression was observed for the colocalized risk variant rs10484561. The application of integrative methods, such as those presented here, to other post-GWAS investigations will help identify causal disease variants and enhance our understanding of biological disease mechanisms.
View details for DOI 10.1016/j.ajhg.2012.11.009
View details for Web of Science ID 000313759000013
View details for PubMedID 23246294
View details for PubMedCentralID PMC3542469
-
Passive and active DNA methylation and the interplay with genetic variation in gene regulation.
eLife
2013; 2
Abstract
DNA methylation is an essential epigenetic mark whose role in gene regulation and its dependency on genomic sequence and environment are not fully understood. In this study we provide novel insights into the mechanistic relationships between genetic variation, DNA methylation and transcriptome sequencing data in three different cell-types of the GenCord human population cohort. We find that the association between DNA methylation and gene expression variation among individuals are likely due to different mechanisms from those establishing methylation-expression patterns during differentiation. Furthermore, cell-type differential DNA methylation may delineate a platform in which local inter-individual changes may respond to or act in gene regulation. We show that unlike genetic regulatory variation, DNA methylation alone does not significantly drive allele specific expression. Finally, inferred mechanistic relationships using genetic variation as well as correlations with TF abundance reveal both a passive and active role of DNA methylation to regulatory interactions influencing gene expression. DOI:http://dx.doi.org/10.7554/eLife.00523.001.
View details for DOI 10.7554/eLife.00523
View details for PubMedID 23755361
-
Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge.
PloS one
2013; 8 (7)
View details for DOI 10.1371/journal.pone.0068141
View details for PubMedID 23874524
-
Cancer Transcriptome Sequencing and Analysis
Cancer Genomics: From Bench to Personalized Medicine
Elsevier. 2013; 1: 31–49
View details for DOI http://dx.doi.org/10.1016/B978-0-12-396967-5.00003-7
-
Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.
PloS one
2013; 8 (7)
Abstract
Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcriptome, including novel transcriptional events. However, as with earlier expression assays, analysis of RNA-sequencing data requires carefully accounting for factors that may introduce systematic, confounding variability in the expression measurements, resulting in spurious correlations. Here, we consider the problem of modeling and removing the effects of known and hidden confounding factors from RNA-sequencing data. We describe a unified residual framework that encapsulates existing approaches, and using this framework, present a novel method, HCP (Hidden Covariates with Prior). HCP uses a more informed assumption about the confounding factors, and performs as well or better than existing approaches while having a much lower computational cost. Our experiments demonstrate that accounting for known and hidden factors with appropriate models improves the quality of RNA-sequencing data in two very different tasks: detecting genetic variations that are associated with nearby expression variations (cis-eQTLs), and constructing accurate co-expression networks.
View details for DOI 10.1371/journal.pone.0068141
View details for PubMedID 23874524
-
Detection and impact of rare regulatory variants in human disease.
Frontiers in genetics
2013; 4: 67-?
Abstract
Advances in genome sequencing are providing unprecedented resolution of rare and private variants. However, methods which assess the effect of these variants have relied predominantly on information within coding sequences. Assessing their impact in non-coding sequences remains a significant contemporary challenge. In this review, we highlight the role of regulatory variation as causative agents and modifiers of monogenic disorders. We further discuss how advances in functional genomics are now providing new opportunity to assess the impact of rare non-coding variants and their role in disease.
View details for DOI 10.3389/fgene.2013.00067
View details for PubMedID 23755067
View details for PubMedCentralID PMC3668132
-
Sex-biased genetic effects on gene regulation in humans
GENOME RESEARCH
2012; 22 (12): 2368-2375
Abstract
Human regulatory variation, reported as expression quantitative trait loci (eQTLs), contributes to differences between populations and tissues. The contribution of eQTLs to differences between sexes, however, has not been investigated to date. Here we explore regulatory variation in females and males and demonstrate that 12%-15% of autosomal eQTLs function in a sex-biased manner. We show that genes possessing sex-biased eQTLs are expressed at similar levels across the sexes and highlight cases of genes controlling sexually dimorphic and shared traits that are under the control of distinct regulatory elements in females and males. This study illustrates that sex provides important context that can modify the effects of functional genetic variants.
View details for DOI 10.1101/gr.134981.111
View details for Web of Science ID 000311895500005
View details for PubMedID 22960374
-
Mapping cis- and trans-regulatory effects across multiple tissues in twins
NATURE GENETICS
2012; 44 (10): 1084-?
Abstract
Sequence-based variation in gene expression is a key driver of disease risk. Common variants regulating expression in cis have been mapped in many expression quantitative trait locus (eQTL) studies, typically in single tissues from unrelated individuals. Here, we present a comprehensive analysis of gene expression across multiple tissues conducted in a large set of mono- and dizygotic twins that allows systematic dissection of genetic (cis and trans) and non-genetic effects on gene expression. Using identity-by-descent estimates, we show that at least 40% of the total heritable cis effect on expression cannot be accounted for by common cis variants, a finding that reveals the contribution of low-frequency and rare regulatory variants with respect to both transcriptional regulation and complex trait susceptibility. We show that a substantial proportion of gene expression heritability is trans to the structural gene, and we identify several replicating trans variants that act predominantly in a tissue-restricted manner and may regulate the transcription of many genes.
View details for DOI 10.1038/ng.2394
View details for Web of Science ID 000309550200006
View details for PubMedID 22941192
-
Genotype-Based Test in Mapping Cis-Regulatory Variants from Allele-Specific Expression Data
PLOS ONE
2012; 7 (6)
Abstract
Identifying and understanding the impact of gene regulatory variation is of considerable importance in evolutionary and medical genetics; such variants are thought to be responsible for human-specific adaptation and to have an important role in genetic disease. Regulatory variation in cis is readily detected in individuals showing uneven expression of a transcript from its two allelic copies, an observation referred to as allelic imbalance (AI). Identifying individuals exhibiting AI allows mapping of regulatory DNA regions and the potential to identify the underlying causal genetic variant(s). However, existing mapping methods require knowledge of the haplotypes, which make them sensitive to phasing errors. In this study, we introduce a genotype-based mapping test that does not require haplotype-phase inference to locate regulatory regions. The test relies on partitioning genotypes of individuals exhibiting AI and those not expressing AI in a 2×3 contingency table. The performance of this test to detect linkage disequilibrium (LD) between a potential regulatory site and a SNP located in this region was examined by analyzing the simulated and the empirical AI datasets. In simulation experiments, the genotype-based test outperforms the haplotype-based tests with the increasing distance separating the regulatory region from its regulated transcript. The genotype-based test performed equally well with the experimental AI datasets, either from genome-wide cDNA hybridization arrays or from RNA sequencing. By avoiding the need of haplotype inference, the genotype-based test will suit AI analyses in population samples of unknown haplotype structure and will additionally facilitate the identification of cis-regulatory variants that are located far away from the regulated transcript.
View details for DOI 10.1371/journal.pone.0038667
View details for Web of Science ID 000305351700058
View details for PubMedID 22685595
-
Patterns of Cis Regulatory Variation in Diverse Human Populations
PLOS GENETICS
2012; 8 (4): 272-284
Abstract
The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.
View details for DOI 10.1371/journal.pgen.1002639
View details for Web of Science ID 000303441800020
View details for PubMedID 22532805
-
A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes
SCIENCE
2012; 335 (6070): 823-828
Abstract
Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
View details for DOI 10.1126/science.1215040
View details for Web of Science ID 000300356400036
View details for PubMedID 22344438
View details for PubMedCentralID PMC3299548
-
Meta-analysis of genome-wide association studies identifies three new risk loci for atopic dermatitis
NATURE GENETICS
2012; 44 (2): 187-192
Abstract
Atopic dermatitis (AD) is a commonly occurring chronic skin disease with high heritability. Apart from filaggrin (FLG), the genes influencing atopic dermatitis are largely unknown. We conducted a genome-wide association meta-analysis of 5,606 affected individuals and 20,565 controls from 16 population-based cohorts and then examined the ten most strongly associated new susceptibility loci in an additional 5,419 affected individuals and 19,833 controls from 14 studies. Three SNPs reached genome-wide significance in the discovery and replication cohorts combined, including rs479844 upstream of OVOL1 (odds ratio (OR) = 0.88, P = 1.1 × 10(-13)) and rs2164983 near ACTL9 (OR = 1.16, P = 7.1 × 10(-9)), both of which are near genes that have been implicated in epidermal proliferation and differentiation, as well as rs2897442 in KIF3A within the cytokine cluster at 5q31.1 (OR = 1.11, P = 3.8 × 10(-8)). We also replicated association with the FLG locus and with two recently identified association signals at 11q13.5 (rs7927894; P = 0.008) and 20q13.33 (rs6010620; P = 0.002). Our results underline the importance of both epidermal barrier function and immune dysregulation in atopic dermatitis pathogenesis.
View details for DOI 10.1038/ng.1017
View details for Web of Science ID 000299664400018
View details for PubMedID 22197932
View details for PubMedCentralID PMC3272375
-
DNA methylation profiles of human active and inactive X chromosomes
GENOME RESEARCH
2011; 21 (10): 1592-1600
Abstract
X-chromosome inactivation (XCI) is a dosage compensation mechanism that silences the majority of genes on one X chromosome in each female cell. To characterize epigenetic changes that accompany this process, we measured DNA methylation levels in 45,X patients carrying a single active X chromosome (X(a)), and in normal females, who carry one X(a) and one inactive X (X(i)). Methylated DNA was immunoprecipitated and hybridized to high-density oligonucleotide arrays covering the X chromosome, generating epigenetic profiles of active and inactive X chromosomes. We observed that XCI is accompanied by changes in DNA methylation specifically at CpG islands (CGIs). While the majority of CGIs show increased methylation levels on the X(i), XCI actually results in significant reductions in methylation at 7% of CGIs. Both intra- and inter-genic CGIs undergo epigenetic modification, with the biggest increase in methylation occurring at the promoters of genes silenced by XCI. In contrast, genes escaping XCI generally have low levels of promoter methylation, while genes that show inter-individual variation in silencing show intermediate increases in methylation. Thus, promoter methylation and susceptibility to XCI are correlated. We also observed a global correlation between CGI methylation and the evolutionary age of X-chromosome strata, and that genes escaping XCI show increased methylation within gene bodies. We used our epigenetic map to predict 26 novel genes escaping XCI, and searched for parent-of-origin-specific methylation differences, but found no evidence to support imprinting on the human X chromosome. Our study provides a detailed analysis of the epigenetic profile of active and inactive X chromosomes.
View details for DOI 10.1101/gr.112680.110
View details for Web of Science ID 000295407800004
View details for PubMedID 21862626
-
Epistatic Selection between Coding and Regulatory Variation in Human Evolution and Disease
AMERICAN JOURNAL OF HUMAN GENETICS
2011; 89 (3): 459-463
Abstract
Interaction (nonadditive effects) between genetic variants has been highlighted as an important mechanism underlying phenotypic variation, but the discovery of genetic interactions in humans has proved difficult. In this study, we show that the spectrum of variation in the human genome has been shaped by modifier effects of cis-regulatory variation on the functional impact of putatively deleterious protein-coding variants. We analyzed 1000 Genomes population-scale resequencing data from Europe (CEU [Utah residents with Northern and Western European ancestry from the CEPH collection]) and Africa (YRI [Yoruba in Ibadan, Nigeria]) together with gene expression data from arrays and RNA sequencing for the same samples. We observed an underrepresentation of derived putatively functional coding variation on the more highly expressed regulatory haplotype, which suggests stronger purifying selection against deleterious coding variants that have increased penetrance because of their regulatory background. Furthermore, the frequency spectrum and impact size distribution of common regulatory polymorphisms (eQTLs) appear to be shaped in order to minimize the selective disadvantage of having deleterious coding mutations on the more highly expressed haplotype. Interestingly, eQTLs explaining common disease GWAS signals showed an enrichment of putative epistatic effects, suggesting that some disease associations might arise from interactions increasing the penetrance of rare coding variants. In conclusion, our results indicate that regulatory and coding variants often modify the functional impact of each other. This specific type of genetic interaction is detectable from sequencing data in a genome-wide manner, and characterizing these joint effects might help us understand functional mechanisms behind genetic associations to human phenotypes-including both Mendelian and common disease.
View details for DOI 10.1016/j.ajhg.2011.08.004
View details for Web of Science ID 000294939800012
View details for PubMedID 21907014
-
Rare and Common Regulatory Variation in Population-Scale Sequenced Human Genomes
PLOS GENETICS
2011; 7 (7)
Abstract
Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.
View details for DOI 10.1371/journal.pgen.1002144
View details for Web of Science ID 000293338600007
View details for PubMedID 21811411
-
Genome-wide association study identifies a common variant associated with risk of endometrial cancer
NATURE GENETICS
2011; 43 (5): 451-?
Abstract
Endometrial cancer is the most common malignancy of the female genital tract in developed countries. To identify genetic variants associated with endometrial cancer risk, we performed a genome-wide association study involving 1,265 individuals with endometrial cancer (cases) from Australia and the UK and 5,190 controls from the Wellcome Trust Case Control Consortium. We compared genotype frequencies in cases and controls for 519,655 SNPs. Forty seven SNPs that showed evidence of association with endometrial cancer in stage 1 were genotyped in 3,957 additional cases and 6,886 controls. We identified an endometrial cancer susceptibility locus close to HNF1B at 17q12 (rs4430796, P = 7.1 × 10(-10)) that is also associated with risk of prostate cancer and is inversely associated with risk of type 2 diabetes.
View details for DOI 10.1038/ng.812
View details for Web of Science ID 000289972600015
View details for PubMedID 21499250
-
From expression QTLs to personalized transcriptomics
NATURE REVIEWS GENETICS
2011; 12 (4): 277-282
Abstract
Approaches that combine expression quantitative trait loci (eQTLs) and genome-wide association (GWA) studies are offering new functional information about the aetiology of complex human traits and diseases. Improved study designs--which take into account technological advances in resolving the transcriptome, cell history and state, population of origin and diverse endophenotypes--are providing insights into the architecture of disease and the landscape of gene regulation in humans. Furthermore, these advances are helping to establish links between cellular effects and organismal traits.
View details for DOI 10.1038/nrg2969
View details for Web of Science ID 000288531700011
View details for PubMedID 21386863
-
The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study
PLOS GENETICS
2011; 7 (2)
Abstract
While there have been studies exploring regulatory variation in one or more tissues, the complexity of tissue-specificity in multiple primary tissues is not yet well understood. We explore in depth the role of cis-regulatory variation in three human tissues: lymphoblastoid cell lines (LCL), skin, and fat. The samples (156 LCL, 160 skin, 166 fat) were derived simultaneously from a subset of well-phenotyped healthy female twins of the MuTHER resource. We discover an abundance of cis-eQTLs in each tissue similar to previous estimates (858 or 4.7% of genes). In addition, we apply factor analysis (FA) to remove effects of latent variables, thus more than doubling the number of our discoveries (1,822 eQTL genes). The unique study design (Matched Co-Twin Analysis--MCTA) permits immediate replication of eQTLs using co-twins (93%-98%) and validation of the considerable gain in eQTL discovery after FA correction. We highlight the challenges of comparing eQTLs between tissues. After verifying previous significance threshold-based estimates of tissue-specificity, we show their limitations given their dependency on statistical power. We propose that continuous estimates of the proportion of tissue-shared signals and direct comparison of the magnitude of effect on the fold change in expression are essential properties that jointly provide a biologically realistic view of tissue-specificity. Under this framework we demonstrate that 30% of eQTLs are shared among the three tissues studied, while another 29% appear exclusively tissue-specific. However, even among the shared eQTLs, a substantial proportion (10%-20%) have significant differences in the magnitude of fold change between genotypic classes across tissues. Our results underline the need to account for the complexity of eQTL tissue-specificity in an effort to assess consequences of such variants for complex traits.
View details for DOI 10.1371/journal.pgen.1002003
View details for Web of Science ID 000287697300035
View details for PubMedID 21304890
-
Identification of cis- and trans- regulatory variation modulating microRNA expression levels in human fibroblasts
GENOME RESEARCH
2011; 21 (1): 68-73
Abstract
MicroRNAs (miRNAs) are regulatory noncoding RNAs that affect the production of a significant fraction of human mRNAs via post-transcriptional regulation. Interindividual variation of the miRNA expression levels is likely to influence the expression of miRNA target genes and may therefore contribute to phenotypic differences in humans, including susceptibility to common disorders. The extent to which miRNA levels are genetically controlled is largely unknown. In this report, we assayed the expression levels of miRNAs in primary fibroblasts from 180 European newborns of the GenCord project and performed association analysis to identify eQTLs (expression quantitative traits loci). We detected robust expression for 121 miRNAs out of 365 interrogated. We have identified significant cis- (10%) and trans- (11%) eQTLs. Furthermore, we detected one genomic locus (rs1522653) that influences the expression levels of five miRNAs, thus unraveling a novel mechanism for coregulation of miRNA expression.
View details for DOI 10.1101/gr.109371.110
View details for Web of Science ID 000285868300007
View details for PubMedID 21147911
-
A map of human genome variation from population-scale sequencing
NATURE
2010; 467 (7319): 1061-1073
Abstract
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
View details for DOI 10.1038/nature09534
View details for Web of Science ID 000283548600039
View details for PubMedCentralID PMC3042601
-
Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies
BIOINFORMATICS
2010; 26 (19): 2474-2476
Abstract
Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols.http://www.sanger.ac.uk/resources/software/genevar.
View details for DOI 10.1093/bioinformatics/btq452
View details for Web of Science ID 000282170000023
View details for PubMedID 20702402
-
Integrating common and rare genetic variation in diverse human populations
NATURE
2010; 467 (7311): 52-58
Abstract
Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of
View details for DOI 10.1038/nature09298
View details for Web of Science ID 000281461200033
View details for PubMedID 20811451
-
Transcriptome genetics using second generation sequencing in a Caucasian population
NATURE
2010; 464 (7289): 773-U151
Abstract
Gene expression is an important phenotype that informs about genetic and environmental effects on cellular state. Many studies have previously identified genetic variants for gene expression phenotypes using custom and commercially available microarrays. Second generation sequencing technologies are now providing unprecedented access to the fine structure of the transcriptome. We have sequenced the mRNA fraction of the transcriptome in 60 extended HapMap individuals of European descent and have combined these data with genetic variants from the HapMap3 project. We have quantified exon abundance based on read depth and have also developed methods to quantify whole transcript abundance. We have found that approximately 10 million reads of sequencing can provide access to the same dynamic range as arrays with better quantification of alternative and highly abundant transcripts. Correlation with SNPs (small nucleotide polymorphisms) leads to a larger discovery of eQTLs (expression quantitative trait loci) than with arrays. We also detect a substantial number of variants that influence the structure of mature transcripts indicating variants responsible for alternative splicing. Finally, measures of allele-specific expression allowed the identification of rare eQTLs and allelic differences in transcript structure. This analysis shows that high throughput sequencing technologies reveal new properties of genetic effects on the transcriptome and allow the exploration of genetic effects in cellular processes.
View details for DOI 10.1038/nature08903
View details for Web of Science ID 000276205000048
View details for PubMedID 20220756
-
Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations
PLOS GENETICS
2010; 6 (4)
Abstract
The recent success of genome-wide association studies (GWAS) is now followed by the challenge to determine how the reported susceptibility variants mediate complex traits and diseases. Expression quantitative trait loci (eQTLs) have been implicated in disease associations through overlaps between eQTLs and GWAS signals. However, the abundance of eQTLs and the strong correlation structure (LD) in the genome make it likely that some of these overlaps are coincidental and not driven by the same functional variants. In the present study, we propose an empirical methodology, which we call Regulatory Trait Concordance (RTC) that accounts for local LD structure and integrates eQTLs and GWAS results in order to reveal the subset of association signals that are due to cis eQTLs. We simulate genomic regions of various LD patterns with both a single or two causal variants and show that our score outperforms SNP correlation metrics, be they statistical (r(2)) or historical (D'). Following the observation of a significant abundance of regulatory signals among currently published GWAS loci, we apply our method with the goal to prioritize relevant genes for each of the respective complex traits. We detect several potential disease-causing regulatory effects, with a strong enrichment for immunity-related conditions, consistent with the nature of the cell line tested (LCLs). Furthermore, we present an extension of the method in trans, where interrogating the whole genome for downstream effects of the disease variant can be informative regarding its unknown primary biological effect. We conclude that integrating cellular phenotype associations with organismal complex traits will facilitate the biological interpretation of the genetic effects on these traits.
View details for DOI 10.1371/journal.pgen.1000895
View details for Web of Science ID 000277354200012
View details for PubMedID 20369022
-
Out of the sequencer and into the wiki as we face new challenges in genome informatics.
Genome biology
2010; 11 (10): 308-?
Abstract
A report on the joint Cold Spring Harbor Laboratory/Wellcome Trust Conference 'Genome Informatics', 15-19 September 2010, Hinxton, Cambridge, UK.
View details for DOI 10.1186/gb-2010-11-10-308
View details for PubMedID 21067526
-
Annotating the regulatory genome.
Methods in molecular biology (Clifton, N.J.)
2010; 674: 313-349
Abstract
Determining the timing and molecular repertoire responsible for gene expression is fundamental to understanding a gene's function. Heritable differences in this character are increasingly regarded as explanatory for complex and common traits. For many known trait-predisposing genes, studies have sought to elucidate the associated logic behind gene regulation. However, there exist many challenges in deciphering these mechanisms. Among them, it is recognized that we have limited understanding of regulatory complexity, the current models of gene regulation have low specificity and any gene's regulatory logic is dependent on biological context. Addressing these limitations and defining the regulatory genome is an ongoing challenge for molecular biology. We discuss current efforts to define and annotate the regulatory genome by focusing on curation and text-mining activities. We further highlight the type of information and curation process for describing regulatory elements within the ORegAnno database ( www.oreganno.org ) and how the general standards for such information are changing.
View details for DOI 10.1007/978-1-60761-854-6_20
View details for PubMedID 20827601
-
The resolution of the genetics of gene expression
HUMAN MOLECULAR GENETICS
2009; 18: R211-R215
Abstract
Understanding the influence of genetics on the molecular mechanisms underpinning human phenotypic diversity is fundamental to being able to predict health outcomes and treat disease. To interrogate the role of genetics on cellular state and function, gene expression has been extensively used. Past and present studies have highlighted important patterns of heritability, population differentiation and tissue-specificity in gene expression. Current and future studies are taking advantage of systems biology-based approaches and advances in sequencing technology: new methodology aims to translate regulatory networks to enrich pathways responsible for disease etiology and 2nd generation sequencing now offers single-molecular resolution of the transcriptome providing unprecedented information on the structural and genetic characteristics of gene expression. Such advances are leading to a future where rich cellular phenotypes will facilitate understanding of the transmission of genetic effect from the gene to organism.
View details for DOI 10.1093/hmg/ddp400
View details for Web of Science ID 000271265600012
View details for PubMedID 19808798
-
Common Regulatory Variation Impacts Gene Expression in a Cell Type-Dependent Manner
SCIENCE
2009; 325 (5945): 1246-1250
Abstract
Studies correlating genetic variation to gene expression facilitate the interpretation of common human phenotypes and disease. As functional variants may be operating in a tissue-dependent manner, we performed gene expression profiling and association with genetic variants (single-nucleotide polymorphisms) on three cell types of 75 individuals. We detected cell type-specific genetic effects, with 69 to 80% of regulatory variants operating in a cell type-specific manner, and identified multiple expressive quantitative trait loci (eQTLs) per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene. Cell type-specific eQTLs were found at larger distances from genes and at lower effect size, similar to known enhancers. These data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell-type specificity.
View details for DOI 10.1126/science.1174148
View details for Web of Science ID 000269523200038
View details for PubMedID 19644074
-
Is the thrifty genotype hypothesis supported by evidence based on confirmed type 2 diabetes- and obesity-susceptibility variants?
DIABETOLOGIA
2009; 52 (9): 1846-1851
Abstract
According to the thrifty genotype hypothesis, the high prevalence of type 2 diabetes and obesity is a consequence of genetic variants that have undergone positive selection during historical periods of erratic food supply. The recent expansion in the number of validated type 2 diabetes- and obesity-susceptibility loci, coupled with access to empirical data, enables us to look for evidence in support (or otherwise) of the thrifty genotype hypothesis using proven loci.We employed a range of tests to obtain complementary views of the evidence for selection: we determined whether the risk allele at associated 'index' single-nucleotide polymorphisms is derived or ancestral, calculated the integrated haplotype score (iHS) and assessed the population differentiation statistic fixation index (F (ST)) for 17 type 2 diabetes and 13 obesity loci.We found no evidence for significant differences for the derived/ancestral allele test. None of the studied loci showed strong evidence for selection based on the iHS score. We find a high F (ST) for rs7901695 at TCF7L2, the largest type 2 diabetes effect size found to date.Our results provide some evidence for selection at specific loci, but there are no consistent patterns of selection that provide conclusive confirmation of the thrifty genotype hypothesis. Discovery of more signals and more causal variants for type 2 diabetes and obesity is likely to allow more detailed examination of these issues.
View details for DOI 10.1007/s00125-009-1419-3
View details for Web of Science ID 000268776100018
View details for PubMedID 19526209
-
Current computational methods for prioritizing candidate regulatory polymorphisms.
Methods in molecular biology (Clifton, N.J.)
2009; 569: 89-114
Abstract
Discovery of DNA sequence variants responsible for human phenotypic variation is key to advances in molecular diagnostics and medicines. Historically, variants that alter the protein-coding sequence of genes have been targeted when attempting to identify a trait's etiology; this is done because the rules governing these regions are generally well-understood and candidate variants can be easily selected. However, the effects of variants on gene regulation are increasingly regarded as being as important as protein-coding variation in uncovering the nature of phenotypic variation. I discuss resources and methodology that have recently been developed to computationally prioritize variants that may alter gene expression.
View details for DOI 10.1007/978-1-59745-524-4_5
View details for PubMedID 19623487
-
ORegAnno: an open-access community-driven resource for regulatory annotation
NUCLEIC ACIDS RESEARCH
2008; 36: D107-D113
Abstract
ORegAnno is an open-source, open-access database and literature curation system for community-based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the 'publication queue' allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54 351 identified by text-mining methods. Users can enter or 'check out' papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at: http://www.oreganno.org.
View details for DOI 10.1093/nar/gkm967
View details for Web of Science ID 000252545400020
View details for PubMedID 18006570
-
Text-mining assisted regulatory annotation
GENOME BIOLOGY
2008; 9 (2)
Abstract
Decoding transcriptional regulatory networks and the genomic cis-regulatory logic implemented in their control nodes is a fundamental challenge in genome biology. High-throughput computational and experimental analyses of regulatory networks and sequences rely heavily on positive control data from prior small-scale experiments, but the vast majority of previously discovered regulatory data remains locked in the biomedical literature.We develop text-mining strategies to identify relevant publications and extract sequence information to assist the regulatory annotation process. Using a vector space model to identify Medline abstracts from papers likely to have high cis-regulatory content, we demonstrate that document relevance ranking can assist the curation of transcriptional regulatory networks and estimate that, minimally, 30,000 papers harbor unannotated cis-regulatory data. In addition, we show that DNA sequences can be extracted from primary text with high cis-regulatory content and mapped to genome sequences as a means of identifying the location, organism and target gene information that is critical to the cis-regulatory annotation process.Our results demonstrate that text-mining technologies can be successfully integrated with genome annotation systems, thereby increasing the availability of annotated cis-regulatory data needed to catalyze advances in the field of gene regulation.
View details for DOI 10.1186/gb-2008-9-2-r31
View details for Web of Science ID 000254659300013
View details for PubMedID 18271954
-
Population genomics of human gene expression
NATURE GENETICS
2007; 39 (10): 1217-1224
Abstract
Genetic variation influences gene expression, and this variation in gene expression can be efficiently mapped to specific genomic regions and variants. Here we have used gene expression profiling of Epstein-Barr virus-transformed lymphoblastoid cell lines of all 270 individuals genotyped in the HapMap Consortium to elucidate the detailed features of genetic variation underlying gene expression variation. We find that gene expression is heritable and that differentiation between populations is in agreement with earlier small-scale studies. A detailed association analysis of over 2.2 million common SNPs per population (5% frequency in HapMap) with gene expression identified at least 1,348 genes with association signals in cis and at least 180 in trans. Replication in at least one independent population was achieved for 37% of cis signals and 15% of trans signals, respectively. Our results strongly support an abundance of cis-regulatory variation in the human genome. Detection of trans effects is limited but suggests that regulatory variation may be the key primary effect contributing to phenotypic variation in humans. We also explore several methodologies that improve the current state of analysis of gene expression variation.
View details for DOI 10.1038/ng2142
View details for Web of Science ID 000249737400017
View details for PubMedID 17873874
View details for PubMedCentralID PMC2683249
-
A survey of genomic properties for the detection of regulatory polymorphisms
PLOS COMPUTATIONAL BIOLOGY
2007; 3 (6): 1000-1010
Abstract
Advances in the computational identification of functional noncoding polymorphisms will aid in cataloging novel determinants of health and identifying genetic variants that explain human evolution. To date, however, the development and evaluation of such techniques has been limited by the availability of known regulatory polymorphisms. We have attempted to address this by assembling, from the literature, a computationally tractable set of regulatory polymorphisms within the ORegAnno database (http://www.oreganno.org). We have further used 104 regulatory single-nucleotide polymorphisms from this set and 951 polymorphisms of unknown function, from 2-kb and 152-bp noncoding upstream regions of genes, to investigate the discriminatory potential of 23 properties related to gene regulation and population genetics. Among the most important properties detected in this region are distance to transcription start site, local repetitive content, sequence conservation, minor and derived allele frequencies, and presence of a CpG island. We further used the entire set of properties to evaluate their collective performance in detecting regulatory polymorphisms. Using a 10-fold cross-validation approach, we were able to achieve a sensitivity and specificity of 0.82 and 0.71, respectively, and we show that this performance is strongly influenced by the distance to the transcription start site.
View details for DOI 10.1371/journal.pcbi.0030106
View details for Web of Science ID 000249105500010
View details for PubMedID 17559298
-
ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation
BIOINFORMATICS
2006; 22 (5): 637-640
Abstract
Our understanding of gene regulation is currently limited by our ability to collectively synthesize and catalogue transcriptional regulatory elements stored in scientific literature. Over the past decade, this task has become increasingly challenging as the accrual of biologically validated regulatory sequences has accelerated. To meet this challenge, novel community-based approaches to regulatory element annotation are required.Here, we present the Open Regulatory Annotation (ORegAnno) database as a dynamic collection of literature-curated regulatory regions, transcription factor binding sites and regulatory mutations (polymorphisms and haplotypes). ORegAnno has been designed to manage the submission, indexing and validation of new annotations from users worldwide. Submissions to ORegAnno are immediately cross-referenced to EnsEMBL, dbSNP, Entrez Gene, the NCBI Taxonomy database and PubMed, where appropriate.ORegAnno is available directly through MySQL, Web services, and online at http://www.oreganno.org. All software is licensed under the Lesser GNU Public License (LGPL).
View details for DOI 10.1093/bioinformatics/btk027
View details for Web of Science ID 000235604400024
View details for PubMedID 16397004
-
cisRED: a database system for genome-scale computational discovery of regulatory elements
NUCLEIC ACIDS RESEARCH
2006; 34: D68-D73
Abstract
We describe cisRED, a database for conserved regulatory elements that are identified and ranked by a genome-scale computational system (www.cisred.org). The database and high-throughput predictive pipeline are designed to address diverse target genomes in the context of rapidly evolving data resources and tools. Motifs are predicted in promoter regions using multiple discovery methods applied to sequence sets that include corresponding sequence regions from vertebrates. We estimate motif significance by applying discovery and post-processing methods to randomized sequence sets that are adaptively derived from target sequence sets, retain motifs with p-values below a threshold and identify groups of similar motifs and co-occurring motif patterns. The database offers information on atomic motifs, motif groups and patterns. It is web-accessible, and can be queried directly, downloaded or installed locally.
View details for DOI 10.1093/nar/gkj075
View details for Web of Science ID 000239307700015
View details for PubMedID 16381958
-
An application of peer-to-peer technology to the discovery, use and assessment of bioinformatics programs
NATURE METHODS
2005; 2 (8): 563-563
View details for Web of Science ID 000230884500002
View details for PubMedID 16094378
-
Sockeye: A 3D environment for comparative genomics
GENOME RESEARCH
2004; 14 (5): 956-962
Abstract
Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization.
View details for DOI 10.1101/gr.1890304
View details for Web of Science ID 000221171700022
View details for PubMedID 15123592
-
The genome sequence of the SARS-associated coronavirus
SCIENCE
2003; 300 (5624): 1399-1404
Abstract
We sequenced the 29,751-base genome of the severe acute respiratory syndrome (SARS)-associated coronavirus known as the Tor2 isolate. The genome sequence reveals that this coronavirus is only moderately related to other known coronaviruses, including two human coronaviruses, HCoV-OC43 and HCoV-229E. Phylogenetic analysis of the predicted viral proteins indicates that the virus does not closely resemble any of the three previously known groups of coronaviruses. The genome sequence will aid in the diagnosis of SARS virus infection in humans and potential animal hosts (using polymerase chain reaction and immunological tests), in the development of antivirals (including neutralizing antibodies), and in the identification of putative epitopes for vaccine development.
View details for DOI 10.1126/science.1085953
View details for Web of Science ID 000183181800036
View details for PubMedID 12730501