Stephen B. Montgomery

Stanford Medicine Professor of Pathology, Professor of Genetics and of Biomedical Data Science and, by courtesy, of Computer Science

Bio

Stephen Montgomery is an Endowed Professor of Pathology, Genetics, Biomedical Data Science and, by courtesy, Computer Science at Stanford University. He has trained in multiple countries including Canada, Germany, England, and Switzerland. He is best known for his work mapping the effects of genetic variation to gene expression and authored the first publications that compared whole genomes and transcriptome data within a human population and pioneered the use of molecular outliers to identify impactful rare variants (Montgomery et al, 2010, Montgomery et al, 2011).

Montgomery and his lab lead major genomics initiatives to understand the molecular mechanisms that underlie disease-associated variation. In 2017, they published analyses from the Genotype-Tissue-Expression (GTEx) Consortium which analyzed the impact of genetic variation on gene expression across tissues of the human body (GTEx Consortium, 2017). In 2024, his lab led major analyses in the NIH Common Fund MoTrPAC study identifying the molecular effects of exercise training across rat tissues (MoTrPAC, 2024). He is a Principal Investigator within multiple major NIH consortia including the GREGoR, MoTrPAC, TOPMed and Functional ADSP consortia and an Investigator in the Developmental GTEx, IGVF, SMaHT, AllOfUs, Undiagnosed Disease Network and ENCODE4 consortia demonstrating his labs ongoing impacts in multiple major genomics projects.

The Montgomery lab has a specific focus on mapping the molecular effects of rare and environment-responsive genetic variants. Work in his laboratory focuses on developing approaches for studying rare variants (such as Li et al, 2017; Ferraro et al, 2020) and seeing these approaches applied to understanding novel disease biology and providing diagnoses of individuals with genetic diseases (Fresard et al., 2017). As a PI of the GREGoR Stanford Site, his lab develops and applies these strategies to diagnose individuals with undiagnosed, rare diseases. The GREGoR Stanford site is currently recruiting 500 families with unsolved diagnoses in California to apply novel multi-omics and computational strategies to acheive diagnoses. His laboratory further has a specific focus on understanding the molecular consequences of structural variants and chromosomal copy number changes (Marderstein et al, 2024).

The Montgomery lab is also focused on advancing our understanding of common genetic variants and understudied RNAs. Examples of this work, his lab has demonstrated that multiple genetic variants contribute to genetic disease associations (Abell et al, 2022) and his lab has developed approaches to identify impactful long non-coding RNAs that contribute to complex disease (de Goede et al, 2021). Ongoing effort in his lab has focused on neurodegenerative and neurodevelopmental traits.

Montgomery is an active member of both the Stanford and broader research community. Among his contributions, he serves as a co-director of an NHGRI PhD T32 Training Grant, Faculty Director of Graduate Admissions for the Biomedical Data Science program and served for 4 years as a Stanford University Faculty Senator. He has/or is currently on the programming committee for major conferences such as ASHG, AGBT and WTSI Genomics of Rare Diseases. He is the incoming chair for the ASHG Awards committee. He is also a standing member of the NIH GHD Study Section.

In 2019, Montgomery was awarded the annual American Society of Human Genetics Early Career Award for his multi-faceted impacts on human genetics and genomics. In 2023, he was awarded the annual Stanford Prize in Population Genetics and Society. In 2024, he was awarded the Stanford Pathology Research Mentor Award.

Academic Appointments

Professor, Pathology
Professor, Genetics
Professor, Department of Biomedical Data Science
Professor (By courtesy), Computer Science
Member, Bio-X
Member, Cardiovascular Institute
Member, Wu Tsai Human Performance Alliance
Member, Maternal & Child Health Research Institute (MCHRI)
Member, Stanford Medicine Children’s Health Center for IBD and Celiac Disease
Member, Wu Tsai Neurosciences Institute

Administrative Appointments

Director of Genome Informatics, Department of Pathology (2011 - Present)

Professional Education

B.A.Sc., University of British Columbia, Engineering Physics (2002)
Ph.D., University of British Columbia, Genetics (2006)

Current Research and Scholarly Interests

We focus on understanding the effects of genome variation on cellular phenotypes and cellular modeling of disease through genomic approaches such as next generation RNA sequencing in combination with developing and utilizing state-of-the-art bioinformatics and statistical genetics approaches. See our website at http://montgomerylab.stanford.edu/

2025-26 Courses

Practical Application of AI/ML to Healthcare and Biotechnology
BMDS 283 (Spr)
Independent Studies (23)
- Advanced Reading and Research
  CS 499 (Aut, Win, Spr, Sum)
- Advanced Reading and Research
  CS 499P (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
  BMDS 295 (Aut, Win, Spr)
- Curricular Practical Training
  CS 390A (Aut, Win, Spr, Sum)
- Directed Investigation
  BIOE 392 (Aut, Win, Spr, Sum)
- Directed Reading
  BMDS 299 (Aut, Win, Spr, Sum)
- Directed Reading in Genetics
  GENE 299 (Aut, Win, Spr, Sum)
- Directed Reading in Pathology
  PATH 299 (Aut, Win, Spr, Sum)
- Directed Study
  BIOE 391 (Aut, Win, Spr, Sum)
- Early Clinical Experience in Pathology
  PATH 280 (Aut, Win, Spr, Sum)
- Graduate Research
  GENE 399 (Aut, Win, Spr, Sum)
- Graduate Research
  IMMUNOL 399 (Aut, Sum)
- Graduate Research
  PATH 399 (Aut, Win, Spr, Sum)
- Independent Project
  CS 399 (Aut, Win, Spr, Sum)
- Medical Scholars Research
  BMDS 370 (Aut, Win, Spr)
- Medical Scholars Research
  GENE 370 (Aut, Win, Spr, Sum)
- Medical Scholars Research
  PATH 370 (Aut, Win, Spr, Sum)
- Practical Training
  BIOE 299B (Sum)
- Senior Project
  CS 191 (Aut, Win, Spr)
- Supervised Study
  GENE 260 (Aut, Win, Spr, Sum)
- Undergraduate Research
  GENE 199 (Aut, Win, Spr, Sum)
- Undergraduate Research
  PATH 199 (Aut, Win, Spr, Sum)
- Writing Intensive Senior Research Project
  CS 191W (Aut, Win, Spr)
Prior Year Courses
2023-24 Courses
- Informatics in Industry
  BIOMEDIN 206 (Spr)
2022-23 Courses
- Informatics in Industry
  BIOMEDIN 206 (Spr)

Stanford Advisees

Doctoral Dissertation Reader (AC)
Vidal Arroyo, Jon Bezney, Ziwei Chen, Tania Fabo, Karen Feng, Michael Hayes, Jodie Lunger, Imani Porter, Taylor Pursell, Alp Tartici
Postdoctoral Faculty Sponsor
Evin Padhi, Yilin Xie
Doctoral Dissertation Advisor (AC)
Maggie Arriaga, Sohaib Hassan, Ronit Jain, Julie Lake, Kate Lawrence, Victoria Rosa, Sherry Yang
Doctoral Dissertation Co-Advisor (AC)
Jordan Cahoon, Haim Krupkin
Master's Program Advisor
Nate Demchak, Daniel Guo, Patrick Walsh
Doctoral (Program)
Alex Belov, Sophia Kivelson, Esther Robb, Min Sun, Christine Yiwen Yeh

Graduate and Fellowship Programs

Biomedical Data Science (Phd Program)
Genetics (Phd Program)

All Publications

Publisher Correction: The impact of exercise on gene regulation in association with complex trait genetics. Nature communications Vetr, N. G., Gay, N. R., Montgomery, S. B. 2026; 17 (1)

View details for DOI 10.1038/s41467-026-72505-6

View details for PubMedID 42045212

View details for PubMedCentralID PMC13121622
The Long Non-coding RNA Landscape of Endurance Exercise Training. Molecular metabolism Bonilauri, B., Smith, G. R., Raja, A. N., Jimenez-Morales, D., Ahmed, A., Jin, C., Sparks, L. M., Walsh, M. J., Montgomery, S. B., Bodine, S. C., Ashley, E. A., Lindholm, M. E. 2026: 102358

Abstract

Long non-coding RNAs (lncRNAs) regulate multiple cellular processes. However, knowledge of the responses and regulatory functions of lncRNAs in physical exercise and training remains limited. As part of the Molecular Transducers of Physical Activity Consortium (MoTrPAC), we conducted a comprehensive analysis of lncRNA expression patterns in 18 tissues after an 8-week progressive endurance training program in rats. The lncRNA expression pattern was largely tissue-specific. In total, 759 unique lncRNAs were found to be differentially expressed across all tissues, generally displaying lower abundance, shorter transcript length, and reduced GC content compared with protein-coding genes. The most pronounced changes were observed in white and brown adipose tissues, the hypothalamus, and the adrenal gland. In the two skeletal muscle tissues investigated, only two lncRNAs were commonly differentially expressed. White and brown adipose tissues revealed a correlation between upregulated differentially expressed lncRNAs and coding genes associated with immune regulation. We identified substantial sex differences in the lncRNA regulatory landscape in response to exercise training. This comprehensive tissue-specific characterization of exercise-responsive lncRNAs opens new avenues for understanding exercise as molecular medicine and may inform the development of lncRNA-targeted therapeutics that harness the beneficial effects of exercise.

View details for DOI 10.1016/j.molmet.2026.102358

View details for PubMedID 42019922
De Novo Variants Associated With Autosomal Recessive Conditions: Case Series and Implications for Genetic Testing and Counseling. American journal of medical genetics. Part A Niehaus, A. D., Bonner, D. E., Carter, J., Avello, K., Jacob, N., Neu, M. B., Mendez, R., Qiao, W., Scott, S. A., Levy, R. J., Mattas, L., Schymick, J., Van Andel, M., Muntoni, F., Mueller, J., Sarkozy, A., DiTroia, S., O'Leary, M., Neale, A., O'Donnell-Luria, A., Toro, C., Wolfe, L. A., Martinez-Agosto, J. A., Montgomery, S. B., Wheeler, M. T., Bernstein, J. A., Tise, C. G. 2026

Abstract

The vast majority of individuals with autosomal recessive (AR) conditions demonstrate biparental inheritance of the disease-causing alleles; however, de novo variants also contribute to AR disease. This report represents the largest cohort to-date of rare AR conditions in which one of the disease-causing alleles was inherited and one occurred de novo. Clinical and research staff at Stanford University, clinical sites of the Undiagnosed Diseases Network (UDN) and Genomics Research to Elucidate the Genetics of Rare diseases (GREGoR) Consortium, and a large clinical genetic testing laboratory were contacted to identify cases of an AR diagnosis resulting from an inherited and de novo disease-causing variant in trans. Fifteen cases of AR conditions caused by one inherited and one de novo variant in a gene consistent with the clinical phenotype were identified; all had undergone trio exome or genome sequencing with genetic confirmation of reported relationships. Variants were confirmed to be in trans in eight of the 15 cases. The de novo variant was confirmed (n = 7) or presumed (n = 7) to have arisen on the paternal allele in 14/15 (93%) of cases. Phenotypic and/or molecular evidence of an AR condition should prompt parental segregation analysis to inform diagnosis, recurrence risks, and variant classification. Additional studies are needed to determine the incidence of this phenomenon given the implications for the interpretation of genetic testing and counseling for AR conditions.

View details for DOI 10.1002/ajmg.a.70162

View details for PubMedID 42002855
Multi-ancestry transcriptome prediction with functionally informed variants in TOPMed MESA improves performance of transcriptome-wide association studies. American journal of human genetics Hu, X., Araujo, D. S., Khunsriraksakul, C., Wang, L., Sun, Q., Wen, J., Zhou, L., Ekunwe, L., Lange, L. A., Lange, E. M., Montgomery, S. B., Reiner, A. P., Aguet, F., Ardlie, K. G., Lappalainen, T., Gignoux, C. R., Burchard, E. G., Taylor, K. D., Guo, X., Rotter, J. I., Rich, S. S., Cornell, E., Durda, P., Tracy, R. P., Liu, Y., Johnson, W. C., Papanicolaou, G. P., Perera, M. A., Cho, M. H., Liu, D. J., Raffield, L. M., Li, Y., Wheeler, H. E., Im, H. K., Manichaikul, A. 2026; 113 (4): 828-841

Abstract

Reliable reference transcriptome prediction models are key to accurate multi-ancestry transcriptome-wide association studies (TWASs). We propose three methods leveraging functionally informed variants (FIVs) for transcriptome prediction models to improve multi-ancestry TWASs. We trained models on 1,287 multi-ancestry participants from the Trans-Omics for Precision Medicine (TOPMed) program Multi-Ethnic Study of Atherosclerosis (MESA) with RNA sequencing (RNA-seq) data from peripheral blood mononuclear cells (PBMCs). We validated models' prediction accuracy on two external independent datasets, Geuvadis and Jackson Heart Study. To test robustness of our methods for TWASs, we integrated models with three multi-ancestry GWASs from blood cell, lipid, and pulmonary function traits, respectively. Our methods presented similar prediction accuracy while using a smaller and functionally informed set of variants compared to the benchmark method, elastic net (EN). Overall, our methods achieved higher power and accuracy (with average improved accuracy of 24% over EN) for TWASs. However, no single proposed method outperformed all GWAS traits. To further improve TWAS performance, we propose an omnibus approach that aggregates TWAS summary statistics from our methods. The omnibus approach yielded the highest number of Bonferroni-significant TWAS genes for all GWAS traits, and it further improved TWAS power and accuracy for blood cell traits. Additionally, the omnibus approach detected some trait-relevant important genes that the EN missed. Our study demonstrates the value of including FIVs in multi-ancestry transcriptome prediction models for improving TWAS performance. Further, the observed TWAS improvement depends on the GWAS trait's relevance to the PBMCs used to build our transcriptome prediction models.

View details for DOI 10.1016/j.ajhg.2026.03.008

View details for PubMedID 41932314
Focus on single-gene effects limits discovery and interpretation of complex-trait-associated variants. American journal of human genetics Lawrence, K. A., Gjorgjieva, T., Nachun, D., Montgomery, S. B. 2026

Abstract

Standard quantitative trait locus (QTL) mapping approaches consider variant effects on a single gene at a time, despite abundant evidence of allelic pleiotropy, where a single variant can affect multiple genes simultaneously. While allelic pleiotropy describes variant effects on both local and distal genes or a mixture of molecular effects on a single gene, here, we specifically investigate allelic expression "proxitropy," where a single variant influences the expression of multiple, neighboring genes. We introduce a multi-gene expression QTL (eQTL) mapping framework-cis-principal-component eQTL (cis-pc eQTL or pcQTL)-to identify variants associated with shared axes of expression variation across a cluster of neighboring genes. We perform pcQTL mapping in 13 GTEx human tissues and discover novel loci undetected by single-gene approaches. In total, we identify an average of 1,396 pcQTLs/tissue, 27% of which were not discovered by single-gene methods. These novel pcQTLs colocalized with an additional 176 genome-wide association study (GWAS) trait-associated variants and increased the number of colocalizations by 33% over single-gene QTL mapping. These findings highlight the idea that moving beyond single-gene-at-a-time approaches toward multi-gene methods can offer a more comprehensive view of gene regulation and complex-trait-associated variation.

View details for DOI 10.1016/j.ajhg.2026.02.022

View details for PubMedID 41875896
Multi-omic identification of key transcriptional regulatory programs during endurance exercise training in rats. Nature communications Smith, G. R., Zhao, B., Lindholm, M. E., Raja, A., Viggars, M., Pincas, H., Gay, N. R., Sun, Y., Vangeti, S., Ge, Y., Nair, V. D., Sanford, J. A., Amper, M. A., Vasoya, M., Smith, K. S., Ramos, I., Montgomery, S. B., Zaslavsky, E., Bodine, S. C., Esser, K. A., Walsh, M. J., Snyder, M. P., Sealfon, S. C. 2026

Abstract

Transcription factors play a key role in regulating gene expression. We conduct an integrated analysis of chromatin accessibility, DNA methylation, mRNA expression, protein abundance and phosphorylation across eight tissues in fifty rats of equally represented sexes following endurance exercise training to identify coordinated epigenomic and transcriptional changes and determine key transcription factors involved. We uncover tissue-specific endurance exercise training associated changes and transcription factor motif enrichment across differentially expressed genes, accessible regions, and methylated regions. We discover distinct routes of training-induced regulation through either epigenomic alterations providing better access for transcription factors to affect target genes, or via changes in transcription factor expression or activity enabling target gene responses. We identify transcription factor motifs enriched among correlated epigenomic and transcriptomic alterations, differentially expressed genes correlated with exercise-related phenotypic and cell type composition changes, and training-induced activity changes of transcription factors whose target genes are enriched for differentially expressed genes. This analysis elucidates the unique gene regulatory mechanisms mediating diverse transcriptional responses to training across tissues.

View details for DOI 10.1038/s41467-026-70397-0

View details for PubMedID 41862462
Biallelic Variants in RNU6ATAC Result in a Minor Spliceopathy Characterized by Transcriptome-Wide Minor Intron Retention Events and Short Stature with Variable Multisystem Manifestations. HGG advances Mendez, R., Arriaga, T. M., Ma, J., Bonner, D. E., Emami, S., Levy, R. J., Alsagheir, A., Alhaddad, B., Bakur, K., Ungar, R. A., Matalon, D. R., Miller, A. M., Nguyen, J., Smith, K. S., Scott, S. A., Liao, L., Ng, Z., Marwaha, S., Ward, A., Novacic, D., Alkuraya, F. S., Bernstein, J. A., Ganesh, V. S., O'Donnell-Luria, A., Montgomery, S. B., Wheeler, M. T. 2026: 100588

Abstract

We report three individuals with biallelic variants in RNU6ATAC, which encodes the U6atac minor spliceosomal small nuclear RNA (snRNA), causing a multisystem minor spliceopathy. Through RNAseq analysis, we identified a distinctive excess of minor intron retention (MIR) in two unrelated individuals, which guided the identification of biallelic RNU6ATAC variants. The discovery cohort presented with variable multisystem manifestations. One individual presented with refractory epilepsy, microcephaly, developmental delay, ataxia, bilateral toe syndactyly, hypereosinophilia, and short stature, whereas the other exhibited failure to thrive, short stature, primary hypothyroidism, combined variable immunodeficiency, eosinophilic colitis, ichthyosis vulgaris, scoliosis, and chronic inflammatory demyelinating polyneuropathy without neurodevelopmental involvement. Despite organ-specific variation, both individuals displayed impaired growth and eosinophil-driven inflammation. Recently, we identified a third affected individual from an independent cohort whose phenotype bridges these features, combining microcephaly, growth failure with severe immunodeficiency, and skeletal abnormalities. The distinctive excess of MIR outliers in the discovery cohort supports minor spliceosome dysfunction, mirroring the molecular signature of RNU4ATAC-opathy. These findings nominate RNU6ATAC as a disease-associated gene, defining an expanded clinical spectrum of minor spliceopathies. Our study supports the power of integrating genomic and transcriptomic approaches for diagnosing splicing disorders and highlights the critical role of spliceosomal snRNAs in human disease.

View details for DOI 10.1016/j.xhgg.2026.100588

View details for PubMedID 41808409
Biallelic LAMP3 Variants in Five Families with Interstitial Lung Disease: Evidence of a Disease-Gene Association. Genetics in medicine : official journal of the American College of Medical Genetics Keehan, L. A., Ono-Minagi, H., Hadhud, M., Rips, J., Hinds, D. M., Fischer, A. J., Bartlett, J. A., McCray, P. B., Qawasmi, N., Nathan, N., Louvrier, C., Desroziers, T., Damme, M., Griese, M., Wegner, D. J., Cole, F. S., Wambach, J. A., Wheeler, M. T., Burbelo, P. D., Bonner, D. E., Bernstein, J. A., Chiorini, J. A., Breuer, O., Milla, C. 2026: 102531

Abstract

Genetic causes of surfactant dysfunction are associated with childhood interstitial lung disease (chILD). Lysosome-associated membrane glycoprotein 3 (LAMP3) is highly expressed within lamellar bodies of alveolar epithelial type II cells, and variants in LAMP3 have recently been suggested as a novel cause of chILD. This study describes the phenotypes of participants with biallelic variants in LAMP3 and presents functional studies evaluating the role of specific LAMP3 variants.Phenotypic data was collected through chart review and clinical evaluation. In vitro effects of LAMP3 variants were evaluated through immunohistochemistry, WB, and flow cytometry.Thirteen participants were identified with biallelic variants in LAMP3. They presented with variable phenotypes ranging from neonatal respiratory distress to asymptomatic in adulthood. All symptomatic participants demonstrated ground glass opacities early in life and lung fibrosis later in life. For one participant, BAL analysis showed abnormal surfactant protein composition and lung biopsy revealed irregular LB. In vitro studies in lung epithelial cells with induced expression of specific LAMP3 variants demonstrated reduced protein expression and abnormal glycosylation.Biallelic LAMP3 variants are associated with an interstitial lung disease phenotype with variable expressivity. Evaluation for LAMP3 variants should be considered in individuals with unexplained interstitial lung disease.

View details for DOI 10.1016/j.gim.2026.102531

View details for PubMedID 41653023
Multi-omics analysis of endurance exercise reveals cardioprotective remodeling in rat heart Brochet, P., Njoroge, J., Montalvo Hernandez, S., Lindholm, M., Smith, G., Amar, D., Gay, N., Zhao, B., Hung, C., Jin, C., Chavez, C., Nachun, D., Zaslavsky, E., Nudelman, G., Pincas, H., Armenteros, J., Smith, K., Hennig, K., Amper, M., Wolf, M., Vasoya, M., Bararpour, N., Ge, Y., Rasmussen, B., Walsh, M., Snyder, M., Montgomery, S., Sealfon, S., Kraus, W., Yan, Z., Ashley, E., Katz, D., Wheeler, M. LIPPINCOTT WILLIAMS & WILKINS. 2025

View details for DOI 10.1161/circ.152.suppl_3.4367025

View details for Web of Science ID 001613792400012
Long-read, multi-omics resource uncovers structural variants driving molecular trait associations and neurodegenerative disease risk Jensen, T., Le Guen, Y., Talozzi, L., Yang, S., Gorzynski, J., Tauber, A. P., Ashley, E. A., Montgomery, S., Greicius, M. D. SPRINGERNATURE. 2025: 52

View details for Web of Science ID 001671157900111
GREGoR: accelerating genomics for rare diseases. Nature Dawood, M., Heavner, B., Wheeler, M. M., Ungar, R. A., LoTempio, J., Wiel, L., Berger, S., Bernstein, J. A., Chong, J. X., Délot, E. C., Eichler, E. E., Lupski, J. R., Shojaie, A., Talkowski, M. E., Wagner, A. H., Wei, C. L., Wellington, C., Wheeler, M. T., Carvalho, C. M., Gibbs, R. A., Gifford, C. A., May, S., Miller, D. E., Rehm, H. L., Samocha, K. E., Sedlazeck, F. J., Vilain, E., O'Donnell-Luria, A., Posey, J. E., Chadwick, L. H., Bamshad, M. J., Montgomery, S. B. 2025; 647 (8089): 331-342

Abstract

Rare diseases are collectively common, affecting approximately 1 in 20 individuals worldwide. In recent years, rapid progress has been made in rare disease diagnostics due to advances in next-generation sequencing, development of new computational and functional genomics approaches to prioritize genes and variants and increased global sharing of clinical and genetic data. However, more than half of individuals suspected to have a rare disease lack a genetic diagnosis. The Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium was initiated to study thousands of challenging rare disease cases and families and apply, standardize and evaluate emerging genomics technologies and analytics to accelerate their adoption in clinical practice. Furthermore, all data generated, currently representing over 7,500 individuals from over 3,000 families, are rapidly made available to researchers worldwide through the Analysis, Visualization and Informatics Lab-space (AnVIL) to catalyse global efforts to develop approaches for genetic diagnoses in rare diseases. Most of these families have undergone previous clinical genetic testing but remained unsolved, with most being exome-negative. Here we describe the collaborative research framework, datasets and discoveries comprising GREGoR that will provide foundational resources and substrates for the future of rare disease genomics.

View details for DOI 10.1038/s41586-025-09613-8

View details for PubMedID 41224980

View details for PubMedCentralID 9119004
Long-read genome sequencing and multi-omics in aging and neurodegeneration. medRxiv : the preprint server for health sciences Jensen, T. D., Le Guen, Y., Talozzi, L., Yang, S., Gorzynski, J., Peña-Tauber, A., Stewart, I., Ferrasse, A., Nachun, D., Arriaga, M. T., Lee, J., Pulgrossi, R. C., Park, J., Zhang, J., Wagner, A. D., Mormino, E. C., Poston, K. L., Henderson, V. W., He, Z., Wyss-Coray, T., Montgomery, S. B., Ashley, E. A., Greicius, M. D. 2025

Abstract

Structural variants (SVs) are a major source of genetic variation yet remain underexplored in healthy aging and neurodegenerative diseases. We performed nanopore long-read genome sequencing (lrGS) on 551 deeply-phenotyped individuals from Stanford's Aging and Memory Study and Alzheimer's Disease Research Center, generating a comprehensive SV map integrated with matched methylation, transcriptomic, and proteomic data. Over 60% of SVs identified by lrGS were not detected with short-read WGS, including many poorly tagged by single-nucleotide variants (SNVs). We discovered >60,000 SV-QTLs across molecular traits and showed that SVs were more likely than SNVs to be fine-mapped as causal. Colocalization with Alzheimer's and Parkinson's disease GWAS implicated SVs at multiple loci, including TMEM106B, BIN3, and NBEAL1. Multi-omic outlier enrichment and Bayesian modeling prioritized rare functional SVs near known risk genes. Combined, these data reveal widespread regulatory SVs in healthy aging and neurodegeneration, underscoring the importance of lrGS in deciphering complex genetic architecture.

View details for DOI 10.1101/2025.10.10.25337775

View details for PubMedID 41282933

View details for PubMedCentralID PMC12633103
The Long Non-coding RNA Landscape of Endurance Exercise Training. bioRxiv : the preprint server for biology Bonilauri, B., Smith, G. R., Raja, A. N., Jimenez-Morales, D., Ahmed, A., Jin, C., Sparks, L. M., Walsh, M. J., Montgomery, S. B., Bodine, S. C., Ashley, E. A., Lindholm, M. E. 2025

Abstract

Long non-coding RNAs (lncRNAs) regulate multiple cellular processes. However, knowledge of the responses and regulatory functions of lncRNAs in physical exercise and training remains limited. As part of the Molecular Transducers of Physical Activity Consortium (MoTrPAC), we conducted a comprehensive analysis of lncRNA expression patterns in 18 tissues after an 8-week progressive endurance training program in rats. The lncRNA expression pattern was largely tissue-specific. In total, 759 unique lncRNAs were found to be differentially expressed across all tissues, generally displaying lower abundance, shorter transcript length, and reduced GC content compared with protein-coding genes. The most pronounced changes were observed in white and brown adipose tissues, the hypothalamus, and the adrenal gland. In the two skeletal muscle tissues investigated, only two lncRNAs were commonly differentially expressed. White and brown adipose tissues revealed a correlation between upregulated differentially expressed lncRNAs and coding genes associated with immune regulation. We identified substantial sex differences in the lncRNA regulatory landscape in response to exercise training. This comprehensive tissue-specific characterization of exercise-responsive lncRNAs opens new avenues for understanding exercise as molecular medicine and may inform the development of lncRNA-targeted therapeutics that harness the beneficial effects of exercise.

View details for DOI 10.1101/2025.10.09.681231

View details for PubMedID 41278731

View details for PubMedCentralID PMC12632385
Mapping causal non-coding variants in coronary artery disease. Nature cardiovascular research Montgomery, S. B. 2025

View details for DOI 10.1038/s44161-025-00715-0

View details for PubMedID 41057607

View details for PubMedCentralID 9484647
Regulatory genomics at biobank scales. Nature reviews. Genetics Montgomery, S. B. 2025; 26 (10): 657-658

View details for DOI 10.1038/s41576-025-00879-2

View details for PubMedID 40957940

View details for PubMedCentralID 12270542
Transcriptome-wide outlier approach identifies individuals with minor spliceopathies. American journal of human genetics Arriaga, T. M., Mendez, R., Ungar, R. A., Bonner, D. E., Matalon, D. R., Lemire, G., Goddard, P. C., Padhi, E. M., Miller, A. M., Nguyen, J. V., Ma, J., Smith, K. S., Scott, S. A., Liao, L., Ng, Z., Marwaha, S., Bademci, G., Bivona, S. A., Tekin, M., Bernstein, J. A., Montgomery, S. B., O'Donnell-Luria, A., Wheeler, M. T., Ganesh, V. S. 2025

Abstract

RNA sequencing has improved the diagnostic yield of individuals with rare diseases. Current analyses predominantly focus on identifying outliers in single genes that can be attributed to cis-acting variants within the gene locus. This approach overlooks causal variants with trans-acting effects on splicing transcriptome wide, such as variants impacting spliceosome function. We present a transcriptomics-first method to diagnose individuals with rare diseases by examining transcriptome-wide patterns of splicing outliers. Using splicing outlier detection methods (FRASER and FRASER2), we characterized splicing outliers from whole blood for 385 individuals from the Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) and Undiagnosed Diseases Network (UDN) consortia. We examined all individuals for excess intron retention outliers in minor intron-containing genes (MIGs). Minor introns, which account for 0.5% of all introns in the human genome, are removed by small nuclear RNAs (snRNAs) in the minor spliceosome. This approach identified five individuals with excess intron retention outliers in MIGs, all of whom were found to harbor rare, bi-allelic variants in minor spliceosome snRNAs. Four individuals had rare, compound heterozygous variants in RNU4ATAC, which aided the reclassification of four variants. Additionally, one individual had rare, highly conserved, compound heterozygous variants in RNU6ATAC that may disrupt the formation of the catalytic spliceosome, suggesting it is a gene associated with Mendelian disease. These results demonstrate that examining RNA-sequencing data for transcriptome-wide signatures can increase the diagnostic yield of individuals with rare diseases, provide variant-to-function interpretation of spliceopathies, and uncover gene-disease associations.

View details for DOI 10.1016/j.ajhg.2025.08.018

View details for PubMedID 40975062
Interactions Between Dietary Metabolites and Regulatory Risk Variants for Human Colon Cancer. bioRxiv : the preprint server for biology Fabo, T. N., Meyers, R. M., Padhi, E., Kellman, L. N., Zhao, Y., Kundu, S., Reynolds, D. L., Chen, Z., Yang, X., Ko, L., Elfaki, I., Montgomery, S. B., Khavari, P. A. 2025

Abstract

Interactions between genetic variants and environmental factors influence malignancy risk, including for colorectal cancer (CRC). Prevalent CRC susceptibility loci reside predominantly in noncoding regulatory DNA where they may interact with dietary influences to dysregulate expression of specific genes predisposing to neoplasia. The impacts of CRC protective and risk dietary metabolites, butyrate and deoxycholic acid, were thus studied on the transcription-directing activity of 3703 regulatory CRC-associated variants via massively parallel reporter assays (MPRA) in human colonic cells. 1595 variant-dietary metabolite interactions were identified, pointing to dysregulation of MED13L, NKD2, and several modulators of Wnt/β-catenin signaling in potential CRC gene-environment interactions (GxE). Opposing impacts of butyrate and deoxycholic acid were also uncovered, indicating dietary influences may converge on common CRC risk loci and nominating FOSL1 and SP1 as mediators of these opposing responses. Coupling MPRA to relevant environmental factors offers an approach to extend insight into GxE in common human cancers.

View details for DOI 10.1101/2025.09.05.674475

View details for PubMedID 40964363

View details for PubMedCentralID PMC12439979
Disruption of the cerebrospinal fluid-plasma protein balance in cognitive impairment and aging. Nature medicine Farinas, A., Rutledge, J., Bot, V. A., Western, D., Ying, K., Lawrence, K. A., Oh, H. S., Yoon, S., Ding, D. Y., Tsai, A. P., Moran-Losada, P., Timsina, J., Le Guen, Y., Montgomery, S. B., Baker, D., Poston, K. L., Wagner, A. D., Mormino, E., Cruchaga, C., Wyss-Coray, T. 2025

Abstract

The brain barrier system, including the choroid plexus, meninges and brain vasculature, regulates substrate transport and maintains differential protein concentrations between blood and cerebrospinal fluid (CSF). Aging and neurodegeneration disrupt brain barrier function, but proteomic studies of the effects on blood-CSF protein balance are limited. Here we used SomaScan proteomics to characterize paired CSF and plasma samples from 2,171 healthy or cognitively impaired older individuals from multiple cohorts, including the Global Neurodegeneration Proteomics Consortium. We identified proteins with correlated CSF and plasma levels that are produced primarily outside the brain and are enriched for structural domains that may enable their transport across brain barriers. CSF to plasma ratios of 848 proteins increased with aging in healthy control individuals, including complement and coagulation proteins, chemokines and proteins linked to neurodegeneration, whereas 64 protein ratios decreased with age, suggesting substrate-specific barrier regulation. Notably, elevated CSF to plasma ratios of peripherally derived or vascular-associated proteins, including DCUN1D1, MFGE8 and VEGFA, were associated with preserved cognitive function. Genome-wide association studies identified genetic loci associated with CSF to plasma ratios of 241 proteins, many of which have known disease associations, including FCN2, the collagen-like domain of which may facilitate blood-CSF transport. Overall, this work provides molecular insight into the human brain barrier system and its disruption with age and disease, with implications for the development of brain-permeable therapeutics.

View details for DOI 10.1038/s41591-025-03831-3

View details for PubMedID 40665050

View details for PubMedCentralID 4015335
The Somatic Mosaicism across Human Tissues Network. Nature Coorens, T. H., Oh, J. W., Choi, Y. A., Lim, N. S., Zhao, B., Voshall, A., Abyzov, A., Antonacci-Fulton, L., Aparicio, S., Ardlie, K. G., Bell, T. J., Bennett, J. T., Bernstein, B. E., Blanchard, T. G., Boyle, A. P., Buenrostro, J. D., Burns, K. H., Chen, F., Chen, R., Choudhury, S., Doddapaneni, H. V., Eichler, E. E., Evrony, G. D., Faith, M. A., Fazzio, T. G., Fulton, R. S., Garber, M., Gehlenborg, N., Germer, S., Getz, G., Gibbs, R. A., Hernandez, R. G., Jin, F., Korbel, J. O., Landau, D. A., Lawson, H. A., Lennon, N. J., Li, H., Li, Y., Loh, P. R., Marth, G., McConnell, M. J., Mills, R. E., Montgomery, S. B., Natarajan, P., Park, P. J., Satija, R., Sedlazeck, F. J., Shao, D. D., Shen, H., Stergachis, A. B., Underhill, H. R., Urban, A. E., VonDran, M. W., Walsh, C. A., Wang, T., Wu, T. P., Zong, C., Lee, E. A., Vaccarino, F. M. 2025; 643 (8070): 47-59

Abstract

From fertilization onwards, the cells of the human body acquire variations in their DNA sequence, known as somatic mutations. These postzygotic mutations arise from intrinsic errors in DNA replication and repair, as well as from exposure to mutagens. Somatic mutations have been implicated in some diseases, but a fundamental understanding of the frequency, type and patterns of mutations across healthy human tissues has been limited. This is primarily due to the small proportion of cells harbouring specific somatic variants within an individual, making them more challenging to detect than inherited variants. Here we describe the Somatic Mosaicism across Human Tissues Network, which aims to create a reference catalogue of somatic mutations and their clonal patterns across 19 different tissue sites from 150 non-diseased donors and develop new technologies and computational tools to detect somatic mutations and assess their phenotypic consequences, including clonal expansions. This strategy enables a comprehensive examination of the mutational landscape across the human body, and provides a comparison baseline for somatic mutation in diseases. This will lead to a deep understanding of somatic mutations and clonal expansions across the lifespan, as well as their roles in health, in ageing and, by comparison, in diseases.

View details for DOI 10.1038/s41586-025-09096-7

View details for PubMedID 40604182

View details for PubMedCentralID 9402379
Toward optimizing diversifying base editors for high-throughput mutational scanning studies. Nucleic acids research Schwartz, C. I., Abell, N. S., Li, A., , Tycko, J., Truong, A., Montgomery, S. B., Hess, G. T. 2025; 53 (12)

Abstract

Base editors, including diversifying base editors that create C>N mutations, are potent tools for systematically installing point mutations in mammalian genomes and studying their effect on cellular function. Numerous base editor options are available for such studies, but little information exists on how the composition of the editor (deaminase, recruitment method, and fusion architecture) affects editing. To address this knowledge gap, the effect of various design features, such as deaminase recruitment and delivery method (electroporation or lentiviral transduction), on editing was assessed across ∼200 synthetic target sites. The direct fusion of a hyperactive variant of activation-induced cytidine deaminase to the N-terminus of dCas9 (DivA-BE) produced the highest editing efficiency, ∼4-fold better than the previous CRISPR-X method. Additionally, DivA-BE mutagenized the DNA strand that anneals to the targeting sgRNA (target strand) to create complementary C>N mutations, which were absent when the deaminase was fused to the C-terminus of dCas9. Based on these studies that comprehensively analyze the editing patterns of several popular base editors, DivA-BE editors efficiently diversified their target sites, albeit with increased indel frequencies. Overall, the improved editing efficiency makes the DivA-BE editors ideal for discovering functional variants in mutational scanning assays.

View details for DOI 10.1093/nar/gkaf620

View details for PubMedID 40613705
The evolutionarily conserved PRP4K-CHMP4B/vps32 splicing circuit regulates autophagy. Cell reports Mathavarajah, S., Chipurupalli, S., Habib, E. B., Kim, W. D., Aoki, M. M., Corkery, D. P., Whelan, K. I., Lukacs, J., Erkan, M., Martinez, V. D., Smith, K. S., Montgomery, S. B., Salsman, J., Huber, R. J., Dellaire, G. 2025; 44 (7): 115870

Abstract

The pre-mRNA processing factor 4 kinase (PRP4K) is an essential gene in animal cells, making interrogation of its function challenging. Here, we report characterization of a viable knockout model of PRP4K in the social amoeba Dictyostelium discoideum, revealing a function for PRP4K in splicing events controlling autophagy. When prp4k knockout amoebae undergo multicellular development, we observe defects in differentiation linked to abnormal autophagy and aberrant secretion of stalk cell inducer c-di-GMP. Autophagosome-lysosome fusion is impaired after PRP4K loss in both human cell lines and amoebae. PRP4K loss results in mis-splicing and reduced expression of the ESCRT-III gene CHMP4B in human cells and its ortholog vps32 in Dictyostelium, and re-expression of CHMP4B or Vps32 cDNA (respectively) restores normal autophagosome-lysosome fusion in PRP4K-deficient cells. Thus, our work reveals a PRP4K-CHMP4B/vps32 splicing circuit regulating autophagy that is conserved over at least 600 million years of evolution.

View details for DOI 10.1016/j.celrep.2025.115870

View details for PubMedID 40531620
Focus on single gene effects limits discovery and interpretation of complex trait-associated variants. bioRxiv : the preprint server for biology Lawrence, K., Gjorgjieva, T., Montgomery, S. B. 2025

Abstract

Standard QTL mapping approaches consider variant effects on a single gene at a time, despite abundant evidence for allelic pleiotropy, where a single variant can affect multiple genes simultaneously. While allelic pleiotropy describes variant effects on both local and distal genes or a mixture of molecular effects on a single gene, here we specifically investigate allelic expression "proxitropy": where a single variant influences the expression of multiple, neighboring genes. We introduce a multi-gene eQTL mapping framework-cis-principal component expression QTL (cis-pc eQTL or pcQTL)-to identify variants associated with shared axes of expression variation across a cluster of neighboring genes. We perform pcQTL mapping in 13 GTEx human tissues and discover novel loci undetected by single-gene approaches. In total, we identify an average of 1396 pcQTLs/tissue, 27% of which were not discovered by single-gene methods. These novel pcQTL colocalized with an additional 142 GWAS trait-associated variants and increased the number of colocalizations by 34% over single-gene QTL mapping. These findings highlight that moving beyond single-gene-at-a-time approaches toward multi-gene methods can offer a more comprehensive view of gene regulation and complex trait-associated variation.

View details for DOI 10.1101/2025.06.06.658175

View details for PubMedID 40502148

View details for PubMedCentralID PMC12157471
Transcriptomic signatures of rare variant impacts across sex and the X-chromosome. HGG advances Ungar, R. A., Li, T., Vetr, N. G., Ersaro, N., Battle, A., Montgomery, S. B. 2025: 100463

Abstract

The human X-chromosome contains hundreds of genes and has well-established impacts on sex differences and traits. However, the X-chromosome is often excluded from many genetic analyses, limiting broader understanding of variant effects. In particular, the functional impact of rare variants on the X-chromosome is understudied. To investigate functional rare variants on the X-chromosome, we use observations of outlier gene expression from GTEx consortium data. We show outlier genes are enriched for having nearby rare variants on the X-chromosome, and this enrichment is stronger for males. Using the RIVER model, we identified 733 rare variants in 450 genes predicted to have functional differences between males and females. We examined the pharmacogenetic implications of these variants and observed that 25% of drugs with a known sex difference in adverse drug reactions were connected to genes that contained a sex-biased rare variant. We further identify that sex-biased rare variants preferentially impact transcription factors with predicted sex-differential binding, such as the XIST-modulated SIX1. Overall, we observed more within-sex variation than between-sex variation. Combined, our study investigates functional rare variants on the X-chromosome, and further details how sex-stratification of variant effect prediction improves identification of rare variants with predicted sex-biased effects, transcription factor biology, and pharmacogenomic impacts.

View details for DOI 10.1016/j.xhgg.2025.100463

View details for PubMedID 40452186
Predicting expression-altering promoter mutations with deep learning. Science (New York, N.Y.) Jaganathan, K., Ersaro, N., Novakovsky, G., Wang, Y., James, T., Schwartzentruber, J., Fiziev, P., Kassam, I., Cao, F., Hawe, J., Cavanagh, H., Lim, A., Png, G., McRae, J., Banerjee, A., Kumar, A., Ulirsch, J., Zhang, Y., Aguet, F., Wainschtein, P., Sundaram, L., Salcedo, A., Kyriazopoulou Panagiotopoulou, S., Aghamirzaie, D., Padhi, E., Weng, Z., Dong, S., Smedley, D., Caulfield, M., O'Donnell-Luria, A., Rehm, H. L., Sanders, S. J., Kundaje, A., Montgomery, S. B., Ross, M. T., Farh, K. K. 2025: eads7373

Abstract

Only a minority of patients with rare genetic diseases are currently diagnosed by exome sequencing, suggesting that additional unrecognized pathogenic variants may reside in non-coding sequence. Here, we describe PromoterAI, a deep neural network that accurately identifies non-coding promoter variants which dysregulate gene expression. We show that promoter variants with predicted expression-altering consequences produce outlier expression at both RNA and protein levels in thousands of individuals, and that these variants experience strong negative selection in human populations. We observe that clinically relevant genes in rare disease patients are enriched for such variants and validate their functional impact through reporter assays. Our estimates suggest that promoter variation accounts for 6% of the genetic burden associated with rare diseases.

View details for DOI 10.1126/science.ads7373

View details for PubMedID 40440429
Integrated single-cell multiome analysis reveals muscle fiber-type gene regulatory circuitry modulated by endurance exercise. Genome research Rubenstein, A. B., Smith, G. R., Zhang, Z., Chen, X., Chambers, T. L., Ruf-Zamojski, F., Mendelev, N., Cheng, W. S., Zamojski, M., Amper, M. A., Nair, V. D., Marderstein, A. R., Montgomery, S. B., Troyanskaya, O. G., Zaslavsky, E., Trappe, T., Trappe, S., Sealfon, S. C. 2025

Abstract

Endurance exercise induces multi-system adaptations that improve performance and benefit health. Gene regulatory circuit responses within individual skeletal muscle cell types, which are key mediators of exercise effects, have not been studied. We mapped transcriptome, chromatin, and regulatory circuit responses to acute endurance exercise in muscle using same-cell RNA-seq/ATAC-seq multiome assay. High-quality data was obtained from 37,154 nuclei comprising 14 cell types in vastus lateralis samples collected before and 3.5 hours after either 40 min cycling exercise at 70% VO2max or 40 min supine rest. Both shared and cell type specific regulatory programs were identified. Differential gene expression and accessibility sites were largely distinct within nuclei for each cell type and muscle fiber, with the largest numbers of regulatory events observed in the three muscle fiber types (slow, fast, and intermediate) and lumican (LUM) expressing fibro-adipogenic progenitor cells. Single-cell regulatory circuit triad reconstruction (transcription factor, chromatin interaction site, regulated gene) also identified largely distinct gene regulatory circuits modulated by exercise in the three muscle fiber types and LUM-expressing fibro-adipogenic progenitor cells, involving a total of 328 transcription factors acting at chromatin sites regulating 2,025 genes. This web-accessible single-cell dataset and regulatory circuitry map serve as a resource for understanding the molecular underpinnings of the metabolic and physiological effects of exercise and to guide interpretation of the exercise response literature in bulk tissue.

View details for DOI 10.1101/gr.280051.124

View details for PubMedID 40393809
Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. Genome research Jensen, T. D., Ni, B., Reuter, C. M., Gorzynski, J. E., Fazal, S., Bonner, D., Ungar, R. A., Goddard, P. C., Raja, A., Ashley, E. A., Bernstein, J. A., Zuchner, S., Greicius, M. D., Montgomery, S. B., Schatz, M. C., Wheeler, M. T., Battle, A. 2025

Abstract

Rare structural variants (SVs)-insertions, deletions, and complex rearrangements-can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore Technologies long-read genomes of 68 individuals from the undiagnosed disease network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4× increase from short reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably, these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that do not incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression toward improving the prioritization of functional SVs and TREs in rare disease patients.

View details for DOI 10.1101/gr.279323.124

View details for PubMedID 40113264
DragonRNA: Generality of DNA-primed RNA-extension activities by DNA-directed RNA polymerases. Nucleic acids research Greenwald, E., Galls, D., Park, J., Jain, N., Montgomery, S. B., Roy, B., Yin, Y. W., Fire, A. Z. 2025; 53 (6)

Abstract

RNA polymerases (RNAPs) transcribe DNA into RNA. Several RNAPs, including from bacteriophages Sp6 and T7, Escherichia coli, and wheat germ, had been shown to add ribonucleotides to DNA 3' ends. Mitochondria have their own RNAPs (mtRNAPs). Examining reaction products of RNAPs acting on DNA molecules with free 3' ends, we found yeast and human mtRNAP preparations exhibit a robust activity of extending DNA 3' ends with ribonucleotides. The resulting molecules are serial DNA→RNA chains with the input DNA on the 5' end and extended RNA on the 3' end. Such chains were produced from a wide variety of DNA oligonucleotide inputs with short complementarity in the sequence to the DNA 3' end with the sequence of the RNA portion complementary to the input DNA. We provide a set of fluorescence-based assays for facile detection of such products and show that this activity is a general property of diverse RNAPs, including phage RNAPs and multi-subunit E. coli RNAP. These results support a model in which DNA serves as both primer and template, with extension beginning when the 3' end of the DNA is elongated with a ribonucleotide. As this DNA→RNA class of molecule remains unnamed, we propose the name DragonRNA.

View details for DOI 10.1093/nar/gkaf236

View details for PubMedID 40197829

View details for PubMedCentralID PMC11976148
Mapping the regulatory effects of common and rare non-coding variants across cellular and developmental contexts in the brain and heart. bioRxiv : the preprint server for biology Marderstein, A. R., Kundu, S., Padhi, E. M., Deshpande, S., Wang, A., Robb, E., Sun, Y., Yun, C. M., Pomales-Matos, D., Xie, Y., Nachun, D., Jessa, S., Kundaje, A., Montgomery, S. B. 2025

Abstract

Whole genome sequencing has identified over a billion non-coding variants in humans, while GWAS has revealed the non-coding genome as a significant contributor to disease. However, prioritizing causal common and rare non-coding variants in human disease, and understanding how selective pressures have shaped the non-coding genome, remains a significant challenge. Here, we predicted the effects of 15 million variants with deep learning models trained on single-cell ATAC-seq across 132 cellular contexts in adult and fetal brain and heart, producing nearly two billion context-specific predictions. Using these predictions, we distinguish candidate causal variants underlying human traits and diseases and their context-specific effects. While common variant effects are more cell-type-specific, rare variants exert more cell-type-shared regulatory effects, with selective pressures particularly targeting variants affecting fetal brain neurons. To prioritize de novo mutations with extreme regulatory effects, we developed FLARE, a context-specific functional genomic model of constraint. FLARE outperformed other methods in prioritizing case mutations from autism-affected families near syndromic autism-associated genes; for example, identifying mutation outliers near CNTNAP2 that would be missed by alternative approaches. Overall, our findings demonstrate the potential of integrating single-cell maps with population genetics and deep learning-based variant effect prediction to elucidate mechanisms of development and disease-ultimately, supporting the notion that genetic contributions to neurodevelopmental disorders are predominantly rare.

View details for DOI 10.1101/2025.02.18.638922

View details for PubMedID 40027628

View details for PubMedCentralID PMC11870466
Functional analysis of cancer-associated germline risk variants. Nature genetics Kellman, L. N., Neela, P. H., Srinivasan, S., Siprashvili, Z., Shanderson, R. L., Hong, A. W., Rao, D., Porter, D. F., Reynolds, D. L., Meyers, R. M., Guo, M. G., Yang, X., Zhao, Y., Wozniak, G. G., Donohue, L. K., Shenoy, R., Ko, L. A., Nguyen, D. T., Mondal, S., Garcia, O. S., Elcavage, L. E., Elfaki, I., Abell, N. S., Tao, S., Lopez, C. M., Montgomery, S. B., Khavari, P. A. 2025

Abstract

Single-nucleotide variants (SNVs) in regulatory DNA are linked to inherited cancer risk. Massively parallel reporter assays of 4,041 SNVs linked to 13 neoplasms comprising >90% of human malignancies were performed in pertinent primary human cell types and then integrated with matching chromatin accessibility, DNA looping and expression quantitative trait loci data to nominate 380 potentially regulatory SNVs and their putative target genes. The latter highlighted specific protein networks in lifetime cancer risk, including mitochondrial translation, DNA damage repair and Rho GTPase activity. A CRISPR knockout screen demonstrated that a subset of germline putative risk genes also enables the growth of established cancers. Editing one SNV, rs10411210 , showed that its risk allele increases rhophilin RHPN2 expression and stimulus-responsive RhoA activation, indicating that individual SNVs may upregulate cancer-linked pathways. These functional data are a resource for variant prioritization efforts and further interrogation of the mechanisms underlying inherited risk for cancer.

View details for DOI 10.1038/s41588-024-02070-5

View details for PubMedID 39962238

View details for PubMedCentralID 3934208
Exercise intensity and training alter the innate immune cell type and chromosomal origins of circulating cell-free DNA in humans. Proceedings of the National Academy of Sciences of the United States of America Rodrigues, K. B., Weng, Z., Graham, Z. A., Lavin, K., McAdam, J., Tuggle, S. C., Peoples, B., Seay, R., Yang, S., Bamman, M. M., Broderick, T. J., Montgomery, S. B. 2025; 122 (3): e2406954122

Abstract

Exercising regularly promotes health, but these benefits are complicated by acute inflammation induced by exercise. A potential source of inflammation is cell-free DNA (cfDNA), yet the cellular origins, molecular causes, and immune system interactions of exercise-induced cfDNA are unclear. To study these, 10 healthy individuals were randomized to a 12-wk exercise program of either high-intensity tactical training (HITT) or traditional moderate-intensity training (TRAD). Blood plasma was collected pre- and postexercise at weeks 0 and 12 and after 4 wk of detraining upon program completion. Whole-genome enzymatic methylation sequencing (EM-seq) with cell-type proportion deconvolution was applied to cfDNA obtained from the 50 plasma samples and paired to concentration measurements for 90 circulating cytokines. Acute exercise increased the release of cfDNA from neutrophils, dendritic cells (DCs), and macrophages proportional to exercise intensity. Exercise training reduced cfDNA released in HITT participants but not TRAD and from DCs and macrophages but not neutrophils. For most participants, training lowered mitochondrial cfDNA at rest, even after detraining. Using a sequencing analysis approach we developed, we concluded that rapid ETosis, a process of cell death where cells release DNA extracellular traps, was the likely source of cfDNA, demonstrated by enrichment of nuclear DNA. Further, several cytokines were induced by acute exercise, such as IL-6, IL-10, and IL-16, and training attenuated the induction of only IL-6 and IL-17F. Cytokine levels were not associated with cfDNA induction, suggesting that these cytokines are not the main cause of exercise-induced cfDNA. Overall, exercise intensity and training modulated cfDNA release and cytokine responses, contributing to the anti-inflammatory effects of regular exercise.

View details for DOI 10.1073/pnas.2406954122

View details for PubMedID 39805013
The human and non-human primate developmental GTEx projects NATURE Coorens, T. H. H., Guillaumet-Adkins, A., Kovner, R., Linn, R. L., Roberts, V. H. J., Sule, A., Van Hoose, P. M., the dGTEx Consortium, T. 2025; 637 (8046): 557-564

Abstract

Many human diseases are the result of early developmental defects. As most paediatric diseases and disorders are rare, children are critically underrepresented in research. Functional genomics studies primarily rely on adult tissues and lack critical cell states in specific developmental windows. In parallel, little is known about the conservation of developmental programmes across non-human primate (NHP) species, with implications for human evolution. Here we introduce the developmental Genotype-Tissue Expression (dGTEx) projects, which span humans and NHPs and aim to integrate gene expression, regulation and genetics data across development and species. The dGTEx cohort will consist of 74 tissue sites across 120 human donors from birth to adulthood, and developmentally matched NHP age groups, with additional prenatal and adult animals, with 126 rhesus macaques (Macaca mulatta) and 72 common marmosets (Callithrix jacchus). The data will comprise whole-genome sequencing, extensive bulk, single-cell and spatial gene expression profiles, and chromatin accessibility data across tissues and development. Through community engagement and donor diversity, the human dGTEx study seeks to address disparities in genomic research. Thus, dGTEx will provide a reference human and NHP dataset and tissue bank, enabling research into developmental changes in expression and gene regulation, childhood disorders and the effect of genetic variation on development.

View details for DOI 10.1038/s41586-024-08244-9

View details for Web of Science ID 001402006100024

View details for PubMedID 39815096

View details for PubMedCentralID PMC12013525
regionalpcs improve discovery of DNA methylation associations with complex traits. Nature communications Eulalio, T., Sun, M. W., Gevaert, O., Greicius, M. D., Montine, T. J., Nachun, D., Montgomery, S. B. 2025; 16 (1): 368

Abstract

We have developed the regionalpcs method, an approach for summarizing gene-level methylation. regionalpcs addresses the challenge of deciphering complex epigenetic mechanisms in diseases like Alzheimer's disease. In contrast to averaging, regionalpcs uses principal components analysis to capture complex methylation patterns across gene regions. Our method demonstrates a 54% improvement in sensitivity over averaging in simulations, providing a robust framework for identifying subtle epigenetic variations. Applying regionalpcs to Alzheimer's disease brain methylation data, combined with cell type deconvolution, we uncover 838 differentially methylated genes associated with neuritic plaque burden-significantly outperforming conventional methods. Integrating methylation quantitative trait loci with genome-wide association studies identified 17 genes with potential causal roles in Alzheimer's disease risk, including MS4A4A and PICALM. Available in the Bioconductor package regionalpcs, our approach facilitates a deeper understanding of the epigenetic landscape in Alzheimer's disease and opens avenues for research into complex diseases.

View details for DOI 10.1038/s41467-024-55698-6

View details for PubMedID 39753567

View details for PubMedCentralID PMC11698866
Transcriptome-wide outlier approach identifies individuals with minor spliceopathies. medRxiv : the preprint server for health sciences Arriaga, M. T., Mendez, R., Ungar, R. A., Bonner, D. E., Matalon, D. R., Lemire, G., Goddard, P. C., Padhi, E. M., Miller, A. M., Nguyen, J. V., Ma, J., Smith, K. S., Scott, S. A., Liao, L., Ng, Z., Marwaha, S., Bademci, G., Bivona, S. A., Tekin, M., Bernstein, J. A., Montgomery, S. B., O'Donnell-Luria, A., Wheeler, M. T., Ganesh, V. S. 2025

Abstract

RNA-sequencing has improved the diagnostic yield of individuals with rare diseases. Current analyses predominantly focus on identifying outliers in single genes that can be attributed to cis-acting variants within or near that gene. This approach overlooks causal variants with trans-acting effects on splicing transcriptome-wide, such as variants impacting spliceosome function. We present a transcriptomics-first method to diagnose individuals with rare diseases by examining transcriptome-wide patterns of splicing outliers. Using splicing outlier detection methods - FRASER and FRASER2 - we identified splicing outliers from whole blood for 390 individuals from the Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) and Undiagnosed Diseases Network (UDN) consortia. We examined all samples for excess intron retention events in minor intron containing genes. Minor introns, which make up about 0.5% of all introns in the human genome, are removed by small nuclear RNAs (snRNAs) in the minor spliceosome. This approach identified five cases with excess intron retention events in minor intron containing genes, all of which were found to harbor rare, biallelic variants in the minor spliceosome snRNAs. Four had rare, compound heterozygous variants in RNU4ATAC. These results led to the reclassification of four variants. Additionally, one case had rare, highly conserved, compound heterozygous variants in RNU6ATAC that may disrupt the formation of the catalytic spliceosome, suggesting a novel disease-gene candidate. These results demonstrate that examining RNA-sequencing data for known transcriptome-wide signatures can increase the diagnostic yield of individuals with rare diseases, provide variant-to-functional interpretation of spliceopathies, and potentially uncover novel disease genes.

View details for DOI 10.1101/2025.01.02.24318941

View details for PubMedID 39802771

View details for PubMedCentralID PMC11722475
GREGoR: Accelerating Genomics for Rare Diseases. ArXiv Dawood, M., Heavner, B., Wheeler, M. M., Ungar, R. A., LoTempio, J., Wiel, L., Berger, S., Bernstein, J. A., Chong, J. X., Délot, E. C., Eichler, E. E., Gibbs, R. A., Lupski, J. R., Shojaie, A., Talkowski, M. E., Wagner, A. H., Wei, C. L., Wellington, C., Wheeler, M. T., Carvalho, C. M., Gifford, C. A., May, S., Miller, D. E., Rehm, H. L., Sedlazeck, F. J., Vilain, E., O'Donnell-Luria, A., Posey, J. E., Chadwick, L. H., Bamshad, M. J., Montgomery, S. B. 2024

Abstract

Rare diseases are collectively common, affecting approximately one in twenty individuals worldwide. In recent years, rapid progress has been made in rare disease diagnostics due to advances in DNA sequencing, development of new computational and experimental approaches to prioritize genes and genetic variants, and increased global exchange of clinical and genetic data. However, more than half of individuals suspected to have a rare disease lack a genetic diagnosis. The Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium was initiated to study thousands of challenging rare disease cases and families and apply, standardize, and evaluate emerging genomics technologies and analytics to accelerate their adoption in clinical practice. Further, all data generated, currently representing ~7500 individuals from ~3000 families, is rapidly made available to researchers worldwide via the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) to catalyze global efforts to develop approaches for genetic diagnoses in rare diseases (https://gregorconsortium.org/data). The majority of these families have undergone prior clinical genetic testing but remained unsolved, with most being exome-negative. Here, we describe the collaborative research framework, datasets, and discoveries comprising GREGoR that will provide foundational resources and substrates for the future of rare disease genomics.

View details for DOI 10.1101/2024.08.07.24311381

View details for PubMedID 39764392

View details for PubMedCentralID PMC11702807
High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation. Genome research Gustafson, J. A., Gibson, S. B., Damaraju, N., Zalusky, M. P., Hoekzema, K., Twesigomwe, D., Yang, L., Snead, A. A., Richmond, P. A., De Coster, W., Olson, N. D., Guarracino, A., Li, Q., Miller, A. L., Goffena, J., Anderson, Z. B., Storz, S. H., Ward, S. A., Sinha, M., Gonzaga-Jauregui, C., Clarke, W. E., Basile, A. O., Corvelo, A., Reeves, C. E., Helland, A., Musunuri, R. L., Revsine, M., Patterson, K. E., Paschal, C., Zakarian, C., Goodwin, S., Jensen, T. D., Robb, E., McCombie, W. R., Sedlazeck, F. J., Zook, J. M., Montgomery, S. B., Garrison, E., Kolmogorov, M., Schatz, M. C., McLaughlin, R. N., Dashnow, H., Zody, M. C., Loose, M., Jain, M., Eichler, E. E., Miller, D. E. 2024

Abstract

Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

View details for DOI 10.1101/gr.279273.124

View details for PubMedID 39358015
Leaving no patient behind! Expert recommendation in the use of innovative technologies for diagnosing rare diseases. Orphanet journal of rare diseases van Karnebeek, C. D., O'Donnell-Luria, A., Baynam, G., Baudot, A., Groza, T., Jans, J. J., Lassmann, T., Letinturier, M. C., Montgomery, S. B., Robinson, P. N., Sansen, S., Mehrian-Shai, R., Steward, C., Kosaki, K., Durao, P., Sadikovic, B. 2024; 19 (1): 357

Abstract

Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed. In response, IRDiRC proposes the establishment of "a globally coordinated diagnostic and research pipeline". To help facilitate this, IRDiRC formed the Task Force on Integrating New Technologies for Rare Disease Diagnosis. This multi-stakeholder Task Force aims to provide an overview of the current state of innovative diagnostic technologies for clinicians and researchers, focusing on the patient's diagnostic journey. Herein, we provide an overview of a broad spectrum of emerging diagnostic technologies involving genomics, epigenomics and multi-omics, functional testing and model systems, data sharing, bioinformatics, and Artificial Intelligence (AI), highlighting their advantages, limitations, and the current state of clinical adaption. We provide expert recommendations outlining the stepwise application of these innovative technologies in the diagnostic pathways while considering global differences in accessibility. The importance of FAIR (Findability, Accessibility, Interoperability, and Reusability) and CARE (Collective benefit, Authority to control, Responsibility, and Ethics) data management is emphasized, along with the need for enhanced and continuing education in medical genomics. We provide a perspective on future technological developments in genome diagnostics and their integration into clinical practice. Lastly, we summarize the challenges related to genomic diversity and accessibility, highlighting the significance of innovative diagnostic technologies, global collaboration, and equitable access to diagnosis and treatment for people living with rare disease.

View details for DOI 10.1186/s13023-024-03361-0

View details for PubMedID 39334316

View details for PubMedCentralID PMC11438178
Single-cell multi-omics map of human fetal blood in Down syndrome. Nature Marderstein, A. R., De Zuani, M., Moeller, R., Bezney, J., Padhi, E. M., Wong, S., Coorens, T. H., Xie, Y., Xue, H., Montgomery, S. B., Cvejic, A. 2024

Abstract

Down syndrome predisposes individuals to haematological abnormalities, such as increased number of erythrocytes and leukaemia in a process that is initiated before birth and is not entirely understood1-3. Here, to understand dysregulated haematopoiesis in Down syndrome, we integrated single-cell transcriptomics of over 1.1 million cells with chromatin accessibility and spatial transcriptomics datasets using human fetal liver and bone marrow samples from 3 fetuses with disomy and 15 fetuses with trisomy. We found that differences in gene expression in Down syndrome were dependent on both cell type and environment. Furthermore, we found multiple lines of evidence that haematopoietic stem cells (HSCs) in Down syndrome are 'primed' to differentiate. We subsequently established a Down syndrome-specific map linking non-coding elements to genes in disomic and trisomic HSCs using 10X multiome data. By integrating this map with genetic variants associated with blood cell counts, we discovered that trisomy restructured regulatory interactions to dysregulate enhancer activity and gene expression critical to erythroid lineage differentiation. Furthermore, as mutations in Down syndrome display a signature of oxidative stress4,5, we validated both increased mitochondrial mass and oxidative stress in Down syndrome, and observed that these mutations preferentially fell into regulatory regions of expressed genes in HSCs. Together, our single-cell, multi-omic resource provides a high-resolution molecular map of fetal haematopoiesis in Down syndrome and indicates significant regulatory restructuring giving rise to co-occurring haematological conditions.

View details for DOI 10.1038/s41586-024-07946-4

View details for PubMedID 39322663

View details for PubMedCentralID 2480572
A lymphocyte chemoaffinity axis for lung, non-intestinal mucosae and CNS. Nature Ocón, B., Xiang, M., Bi, Y., Tan, S., Brulois, K., Ayesha, A., Kunte, M., Zhou, C., LaJevic, M., Lazarus, N., Mengoni, F., Sharma, T., Montgomery, S., Hooper, J. E., Huang, M., Handel, T., Dawson, J. R., Kufareva, I., Zabel, B. A., Pan, J., Butcher, E. C. 2024

Abstract

Tissue-selective chemoattractants direct lymphocytes to epithelial surfaces to establish local immune environments, regulate immune responses to food antigens and commensal organisms, and protect from pathogens. Homeostatic chemoattractants for small intestines, colon, and skin are known1 2, but chemotropic mechanisms selective for respiratory tract and other non-intestinal mucosal tissues (NIMT) remain poorly understood. Here we leveraged diverse omics datasets to identify GPR25 as a lymphocyte receptor for CXCL17, a chemoattractant cytokine whose expression by epithelial cells of airways, upper gastrointestinal and squamous mucosae unifies the NIMT and distinguishes them from intestinal mucosae. Single-cell transcriptomic analyses show that GPR25 is induced on innate lymphocytes prior to emigration to the periphery, and is imprinted in secondary lymphoid tissues on activated B and T cells responding to immune challenge. GPR25 characterizes B and T tissue resident memory and regulatory T lymphocytes in NIMT and lungs in humans and mediates lymphocyte homing to barrier epithelia of the airways, oral cavity, stomach, biliary and genitourinary tracts in mouse models. GPR25 is also expressed by T cells in cerebrospinal fluid and CXCL17 by neurons, suggesting a role in CNS immune regulation. We reveal widespread imprinting of GPR25 on regulatory T cells, suggesting a mechanistic link to population genetic evidence that GPR25 is protective in autoimmunity3,4. Our results define a GPR25-CXCL17 chemoaffinity axis with the potential to integrate immunity and tolerance at non-intestinal mucosae and the CNS.

View details for DOI 10.1038/s41586-024-08043-2

View details for PubMedID 39293486
Deciphering the impact of genomic variation on function. Nature 2024; 633 (8028): 47-57

Abstract

Our genomes influence nearly every aspect of human biology-from molecular and cellular functions to phenotypes in health and disease. Studying the differences in DNA sequence between individuals (genomic variation) could reveal previously unknown mechanisms of human biology, uncover the basis of genetic predispositions to diseases, and guide the development of new diagnostic tools and therapeutic agents. Yet, understanding how genomic variation alters genome function to influence phenotype has proved challenging. To unlock these insights, we need a systematic and comprehensive catalogue of genome function and the molecular and cellular effects of genomic variants. Towards this goal, the Impact of Genomic Variation on Function (IGVF) Consortium will combine approaches in single-cell mapping, genomic perturbations and predictive modelling to investigate the relationships among genomic variation, genome function and phenotypes. IGVF will create maps across hundreds of cell types and states describing how coding variants alter protein activity, how noncoding variants change the regulation of gene expression, and how such effects connect through gene-regulatory and protein-interaction networks. These experimental data, computational predictions and accompanying standards and pipelines will be integrated into an open resource that will catalyse community efforts to explore how our genomes influence biology and disease across populations.

View details for DOI 10.1038/s41586-024-07510-0

View details for PubMedID 39232149

View details for PubMedCentralID 7405896
SINGLE-CELL MULTI-OMICS MAP OF HUMAN FOETAL BLOOD IN DOWN'S SYNDROME Cvejic, A., Marderstein, A., Montgomery, S. ELSEVIER SCIENCE INC. 2024

View details for Web of Science ID 001343414100083
SINGLE-CELL MULTI-OMICS MAP OF HUMAN FOETAL BLOOD IN DOWN'S SYNDROME Cvejic, A., Marderstein, A., Montgomery, S. ELSEVIER SCIENCE INC. 2024

View details for Web of Science ID 001325038400023
De novo variants in the RNU4-2 snRNA cause a frequent neurodevelopmental syndrome. Nature Chen, Y., Dawes, R., Kim, H. C., Ljungdahl, A., Stenton, S. L., Walker, S., Lord, J., Lemire, G., Martin-Geary, A. C., Ganesh, V. S., Ma, J., Ellingford, J. M., Delage, E., D'Souza, E. N., Dong, S., Adams, D. R., Allan, K., Bakshi, M., Baldwin, E. E., Berger, S. I., Bernstein, J. A., Bhatnagar, I., Blair, E., Brown, N. J., Burrage, L. C., Chapman, K., Coman, D. J., Compton, A. G., Cunningham, C. A., D'Souza, P., Danecek, P., Délot, E. C., Dias, K. R., Elias, E. R., Elmslie, F., Evans, C. A., Ewans, L., Ezell, K., Fraser, J. L., Gallacher, L., Genetti, C. A., Goriely, A., Grant, C. L., Haack, T., Higgs, J. E., Hinch, A. G., Hurles, M. E., Kuechler, A., Lachlan, K. L., Lalani, S. R., Lecoquierre, F., Leitão, E., Fevre, A. L., Leventer, R. J., Liebelt, J. E., Lindsay, S., Lockhart, P. J., Ma, A. S., Macnamara, E. F., Mansour, S., Maurer, T. M., Mendez, H. R., Metcalfe, K., Montgomery, S. B., Moosajee, M., Nassogne, M. C., Neumann, S., O'Donoghue, M., O'Leary, M., Palmer, E. E., Pattani, N., Phillips, J., Pitsava, G., Pysar, R., Rehm, H. L., Reuter, C. M., Revencu, N., Riess, A., Rius, R., Rodan, L., Roscioli, T., Rosenfeld, J. A., Sachdev, R., Shaw-Smith, C. J., Simons, C., Sisodiya, S. M., Snell, P., St Clair, L., Stark, Z., Stewart, H. S., Tan, T. Y., Tan, N. B., Temple, S. E., Thorburn, D. R., Tifft, C. J., Uebergang, E., VanNoy, G. E., Vasudevan, P., Vilain, E., Viskochil, D. H., Wedd, L., Wheeler, M. T., White, S. M., Wojcik, M., Wolfe, L. A., Wolfenson, Z., Wright, C. F., Xiao, C., Zocche, D., Rubenstein, J. L., Markenscoff-Papadimitriou, E., Fica, S. M., Baralle, D., Depienne, C., MacArthur, D. G., Howson, J. M., Sanders, S. J., O'Donnell-Luria, A., Whiffin, N. 2024

Abstract

Around 60% of individuals with neurodevelopmental disorders (NDD) remain undiagnosed after comprehensive genetic testing, primarily of protein-coding genes1. Large genome-sequenced cohorts are improving our ability to discover new diagnoses in the non-coding genome. Here, we identify the non-coding RNA RNU4-2 as a syndromic NDD gene. RNU4-2 encodes the U4 small nuclear RNA (snRNA), which is a critical component of the U4/U6.U5 tri-snRNP complex of the major spliceosome2. We identify an 18 bp region of RNU4-2 mapping to two structural elements in the U4/U6 snRNA duplex (the T-loop and Stem III) that is severely depleted of variation in the general population, but in which we identify heterozygous variants in 115 individuals with NDD. Most individuals (77.4%) have the same highly recurrent single base insertion (n.64_65insT). In 54 individuals where it could be determined, the de novo variants were all on the maternal allele. We demonstrate that RNU4-2 is highly expressed in the developing human brain, in contrast to RNU4-1 and other U4 homologs. Using RNA-sequencing, we show how 5' splice site usage is systematically disrupted in individuals with RNU4-2 variants, consistent with the known role of this region during spliceosome activation. Finally, we estimate that variants in this 18 bp region explain 0.4% of individuals with NDD. This work underscores the importance of non-coding genes in rare disorders and will provide a diagnosis to thousands of individuals with NDD worldwide.

View details for DOI 10.1038/s41586-024-07773-7

View details for PubMedID 38991538
Impact of genome build on RNA-seq interpretation and diagnostics. American journal of human genetics Ungar, R. A., Goddard, P. C., Jensen, T. D., Degalez, F., Smith, K. S., Jin, C. A., Bonner, D. E., Bernstein, J. A., Wheeler, M. T., Montgomery, S. B. 2024

Abstract

Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network and Genomics Research to Elucidate the Genetics of Rare Disease Consortium. Across six routinely collected biospecimens, 61% of quantified genes were not influenced by genome build. However, we identified 1,492 genes with build-dependent quantification, 3,377 genes with build-exclusive expression, and 9,077 genes with annotation-specific expression across six routinely collected biospecimens, including 566 clinically relevant and 512 known OMIM genes. Further, we demonstrate that between builds for a given gene, a larger difference in quantification is well correlated with a larger change in expression outlier calling. Combined, we provide a database of genes impacted by build choice and recommend that transcriptomics-guided analyses and diagnoses are cross referenced with these data for robustness.

View details for DOI 10.1016/j.ajhg.2024.05.005

View details for PubMedID 38834072
Loss of function of FAM177A1, a Golgi complex localized protein, causes a novel neurodevelopmental disorder. Genetics in medicine : official journal of the American College of Medical Genetics Kohler, J. N., Legro, N. R., Baldridge, D., Shin, J., Bowman, A., Ugur, B., Jackstadt, M. M., Shriver, L. P., Patti, G. J., Zhang, B., Feng, W., McAdow, A. R., Goddard, P., Ungar, R. A., Jensen, T., Smith, K. S., Fresard, L., Alvarez, R., Bonner, D., Reuter, C. M., McCormack, C., Kravets, E., Marwaha, S., Holt, J. M., Worthey, E., Ashley, E. A., Montgomery, S. B., Fisher, P., Postlethwait, J., De Camilli, P., Solnica-Krezel, L., Bernstein, J. A., Wheeler, M. T. 2024: 101166

Abstract

The function of FAM177A1 and its relationship to human disease is largely unknown. Recent studies have demonstrated FAM177A1 to be a critical immune-associated gene. One previous case study has linked FAM177A1 to a neurodevelopmental disorder in four siblings.We identified five individuals from three unrelated families with biallelic variants in FAM177A1. The physiological function of FAM177A1 was studied in a zebrafish model organism and human cell lines with loss-of-function variants similar to the affected cohort.These individuals share a characteristic phenotype defined by macrocephaly, global developmental delay, intellectual disability, seizures, behavioral abnormalities, hypotonia, and gait disturbance. We show that FAM177A1 localizes to the Golgi complex in mammalian and zebrafish cells. Intersection of the RNA-seq and metabolomic datasets from FAM177A1-deficient human fibroblasts and whole zebrafish larvae demonstrated dysregulation of pathways associated with apoptosis, inflammation, and negative regulation of cell proliferation.Our data sheds light on the emerging function of FAM177A1 and defines FAM177A1-related neurodevelopmental disorder as a new clinical entity.

View details for DOI 10.1016/j.gim.2024.101166

View details for PubMedID 38767059
The impact of exercise on gene regulation in association with complex trait genetics. Nature communications Vetr, N. G., Gay, N. R., MoTrPAC Study Group, Montgomery, S. B., Adkins, J. N., Albertson, B. G., Amar, D., Amper, M. A., Armenteros, J. J., Ashley, E., Avila-Pacheco, J., Bae, D., Balci, A. T., Bamman, M., Bararpour, N., Barton, E. R., Jean Beltran, P. M., Bergman, B. C., Bessesen, D. H., Bodine, S. C., Booth, F. W., Bouverat, B., Buford, T. W., Burant, C. F., Caputo, T., Carr, S., Chambers, T. L., Chavez, C., Chikina, M., Chiu, R., Cicha, M., Clish, C. B., Coen, P. M., Cooper, D., Cornell, E., Cutter, G., Dalton, K. P., Dasari, S., Dennis, C., Esser, K., Evans, C. R., Farrar, R., Fernadez, F. M., Gadde, K., Gagne, N., Gaul, D. A., Ge, Y., Gerszten, R. E., Goodpaster, B. H., Goodyear, L. J., Gritsenko, M. A., Guevara, K., Haddad, F., Hansen, J. R., Harris, M., Hastie, T., Hennig, K. M., Hershman, S. G., Hevener, A., Hirshman, M. F., Hou, Z., Hsu, F., Huffman, K. M., Hung, C., Hutchinson-Bunch, C., Ivanova, A. A., Jackson, B. E., Jankowski, C. M., Jimenez-Morales, D., Jin, C. A., Johannsen, N. M., Newton, R. L., Kachman, M. T., Ke, B. G., Keshishian, H., Kohrt, W. M., Kramer, K. S., Kraus, W. E., Lanza, I., Leeuwenburgh, C., Lessard, S. J., Lester, B., Li, J. Z., Lindholm, M. E., Lira, A. K., Liu, X., Lu, C., Makarewicz, N. S., Maner-Smith, K. M., Mani, D. R., Many, G. M., Marjanovic, N., Marshall, A., Marwaha, S., May, S., Melanson, E. L., Miller, M. E., Monroe, M. E., Moore, S. G., Moore, R. J., Moreau, K. L., Mundorff, C. C., Musi, N., Nachun, D., Nair, V. D., Nair, K. S., Nestor, M. D., Nicklas, B., Nigro, P., Nudelman, G., Ortlund, E. A., Pahor, M., Pearce, C., Petyuk, V. A., Piehowski, P. D., Pincas, H., Powers, S., Presby, D. M., Qian, W., Radom-Aizik, S., Raja, A. N., Ramachandran, K., Ramaker, M. E., Ramos, I., Rankinen, T., Raskind, A. S., Rasmussen, B. B., Ravussin, E., Rector, R. S., Rejeski, W. J., Richards, C. Z., Rirak, S., Robbins, J. M., Rooney, J. L., Rubenstein, A. B., Ruf-Zamojski, F., Rushing, S., Sagendorf, T. J., Samdarshi, M., Sanford, J. A., Savage, E. M., Schauer, I. E., Schenk, S., Schwartz, R. S., Sealfon, S. C., Seenarine, N., Smith, K. S., Smith, G. R., Snyder, M. P., Soni, T., Oliveira De Sousa, L. G., Sparks, L. M., Steep, A., Stowe, C. L., Sun, Y., Teng, C., Thalacker-Mercer, A., Thyfault, J., Tibshirani, R., Tracy, R., Trappe, S., Trappe, T. A., Uppal, K., Vangeti, S., Vasoya, M., Volpi, E., Vornholt, A., Walkup, M. P., Walsh, M. J., Wheeler, M. T., Williams, J. P., Wu, S., Xia, A., Yan, Z., Yu, X., Zang, C., Zaslavsky, E., Zebarjadi, N., Zhang, T., Zhao, B., Zhen, J. 2024; 15 (1): 3346

Abstract

Endurance exercise training is known to reduce risk for a range of complex diseases. However, the molecular basis of this effect has been challenging to study and largely restricted to analyses of either few or easily biopsied tissues. Extensive transcriptome data collected across 15 tissues during exercise training in rats as part of the Molecular Transducers of Physical Activity Consortium has provided a unique opportunity to clarify how exercise can affect tissue-specific gene expression and further suggest how exercise adaptation may impact complex disease-associated genes. To build this map, we integrate this multi-tissue atlas of gene expression changes with gene-disease targets, genetic regulation of expression, and trait relationship data in humans. Consensus from multiple approaches prioritizes specific tissues and genes where endurance exercise impacts disease-relevant gene expression. Specifically, we identify a total of 5523 trait-tissue-gene triplets to serve as a valuable starting point for future investigations [Exercise; Transcription; Human Phenotypic Variation].

View details for DOI 10.1038/s41467-024-45966-w

View details for PubMedID 38693125
Temporal dynamics of the multi-omic response to endurance exercise training. Nature 2024; 629 (8010): 174-183

Abstract

Regular exercise promotes whole-body health and prevents disease, but the underlying molecular mechanisms are incompletely understood1-3. Here, the Molecular Transducers of Physical Activity Consortium4 profiled the temporal transcriptome, proteome, metabolome, lipidome, phosphoproteome, acetylproteome, ubiquitylproteome, epigenome and immunome in whole blood, plasma and 18 solid tissues in male and female Rattus norvegicus over eight weeks of endurance exercise training. The resulting data compendium encompasses 9,466 assays across 19 tissues, 25 molecular platforms and 4 training time points. Thousands of shared and tissue-specific molecular alterations were identified, with sex differences found in multiple tissues. Temporal multi-omic and multi-tissue analyses revealed expansive biological insights into the adaptive responses to endurance training, including widespread regulation of immune, metabolic, stress response and mitochondrial pathways. Many changes were relevant to human health, including non-alcoholic fatty liver disease, inflammatory bowel disease, cardiovascular health and tissue injury and recovery. The data and analyses presented in this study will serve as valuable resources for understanding and exploring the multi-tissue molecular effects of endurance training and are provided in a public repository ( https://motrpac-data.org/ ).

View details for DOI 10.1038/s41586-023-06877-w

View details for PubMedID 38693412

View details for PubMedCentralID PMC11062907
regionalpcs: improved discovery of DNA methylation associations with complex traits. bioRxiv : the preprint server for biology Eulalio, T., Sun, M. W., Gevaert, O., Greicius, M. D., Montine, T. J., Nachun, D., Montgomery, S. B. 2024

Abstract

We have developed the regional principal components (rPCs) method, a novel approach for summarizing gene-level methylation. rPCs address the challenge of deciphering complex epigenetic mechanisms in diseases like Alzheimer's disease (AD). In contrast to traditional averaging, rPCs leverage principal components analysis to capture complex methylation patterns across gene regions. Our method demonstrated a 54% improvement in sensitivity over averaging in simulations, offering a robust framework for identifying subtle epigenetic variations. Applying rPCs to the AD brain methylation data in ROSMAP, combined with cell type deconvolution, we uncovered 838 differentially methylated genes associated with neuritic plaque burden-significantly outperforming conventional methods. Integrating methylation quantitative trait loci (meQTL) with genome-wide association studies (GWAS) identified 17 genes with potential causal roles in AD, including MS4A4A and PICALM. Our approach is available in the Bioconductor package regionalpcs, opening avenues for research and facilitating a deeper understanding of the epigenetic landscape in complex diseases.

View details for DOI 10.1101/2024.05.01.590171

View details for PubMedID 38746367

View details for PubMedCentralID PMC11092597
Sexual dimorphism and the multi-omic response to exercise training in rat subcutaneous white adipose tissue. Nature metabolism Many, G. M., Sanford, J. A., Sagendorf, T. J., Hou, Z., Nigro, P., Whytock, K. L., Amar, D., Caputo, T., Gay, N. R., Gaul, D. A., Hirshman, M. F., Jimenez-Morales, D., Lindholm, M. E., Muehlbauer, M. J., Vamvini, M., Bergman, B. C., Fernandez, F. M., Goodyear, L. J., Hevener, A. L., Ortlund, E. A., Sparks, L. M., Xia, A., Adkins, J. N., Bodine, S. C., Newgard, C. B., Schenk, S., MoTrPAC Study Group, Armenteros, J. J., Amper, M. A., Ashley, E., Asokan, A. K., Avila-Pacheco, J., Bae, D., Bamman, M. M., Bararpour, N., Barnes, J., Buford, T. W., Burant, C. F., Carbone, N. P., Carr, S. A., Chambers, T. L., Chavez, C., Chiu, R., Clish, C. B., Cutter, G. R., Dasari, S., Dennis, C., Evans, C. R., Fernandez, F. M., Gagne, N., Ge, Y., Goodpaster, B. H., Gritsenko, M. A., Hansen, J. R., Hennig, K. M., Huffman, K. M., Hung, C., Hutchinson-Bunch, C., Ilkayeva, O., Ivanova, A. A., Beltran, P. M., Jin, C. A., Kachman, M. T., Keshishian, H., Kraus, W. E., Lanza, I., Lester, B., Li, J. Z., Lira, A. K., Liu, X., Maner-Smith, K. M., May, S., Monroe, M. R., Montgomery, S., Moore, R. J., Moore, S. G., Nachun, D., Nair, K. S., Nair, V., Raja, A. N., Nestor, M. D., Nudelman, G., Petyuk, V. A., Piehowski, P. D., Pincas, H., Qian, W., Raskind, A., Rasmussen, B. B., Rooney, J. L., Rushing, S., Samdarshi, M., Sealfon, S. C., Smith, K. S., Smith, G. R., Snyder, M., Stowe, C. L., Talton, J. W., Teng, C., Thalacker-Mercer, A., Tracy, R., Trappe, T. A., Vasoya, M., Vetr, N. G., Volpi, E., Walkup, M. P., Walsh, M. J., Wheeler, M. T., Wu, S., Zaslavsky, E., Zebarjadi, N., Zhang, T., Zhao, B., Zhen, J. 2024

Abstract

Subcutaneous white adipose tissue (scWAT) is a dynamic storage and secretory organ that regulates systemic homeostasis, yet the impact of endurance exercise training (ExT) and sex on its molecular landscape is not fully established. Utilizing an integrative multi-omics approach, and leveraging data generated by the Molecular Transducers of Physical Activity Consortium (MoTrPAC), we show profound sexual dimorphism in the scWAT of sedentary rats and in the dynamic response of this tissue to ExT. Specifically, the scWAT of sedentary females displays -omic signatures related to insulin signaling and adipogenesis, whereas the scWAT of sedentary males is enriched in terms related to aerobic metabolism. These sex-specific -omic signatures are preserved or amplified with ExT. Integration of multi-omic analyses with phenotypic measures identifies molecular hubs predicted to drive sexually distinct responses to training. Overall, this study underscores the powerful impact of sex on adipose tissue biology and provides a rich resource to investigate the scWAT response to ExT.

View details for DOI 10.1038/s42255-023-00959-9

View details for PubMedID 38693320
Molecular adaptations in response to exercise training are associated with tissue-specific transcriptomic and epigenomic signatures. Cell genomics Nair, V. D., Pincas, H., Smith, G. R., Zaslavsky, E., Ge, Y., Amper, M. A., Vasoya, M., Chikina, M., Sun, Y., Raja, A. N., Mao, W., Gay, N. R., Esser, K. A., Smith, K. S., Zhao, B., Wiel, L., Singh, A., Lindholm, M. E., Amar, D., Montgomery, S., Snyder, M. P., Walsh, M. J., Sealfon, S. C., MoTrPAC Study Group 2024: 100421

Abstract

Regular exercise has many physical and brain health benefits, yet the molecular mechanisms mediating exercise effects across tissues remain poorly understood. Here we analyzed 400 high-quality DNA methylation, ATAC-seq, and RNA-seq datasets from eight tissues from control and endurance exercise-trained (EET) rats. Integration of baseline datasets mapped the gene location dependence of epigenetic control features and identified differing regulatory landscapes in each tissue. The transcriptional responses to 8weeks of EET showed little overlap across tissues and predominantly comprised tissue-type enriched genes. We identified sex differences in the transcriptomic and epigenomic changes induced by EET. However, the sex-biased gene responses were linked to shared signaling pathways. We found that many G protein-coupled receptor-encoding genes are regulated by EET, suggesting a role for these receptors in mediating the molecular adaptations to training across tissues. Our findings provide new insights into the mechanisms underlying EET-induced health benefits across organs.

View details for DOI 10.1016/j.xgen.2023.100421

View details for PubMedID 38697122
Molecular Transducers of Physical Activity Consortium (MoTrPAC): Human Studies Design and Protocol. Journal of applied physiology (Bethesda, Md. : 1985) Group, M. R., Jakicic, J. M., Kohrt, W. M., Houmard, J. A., Miller, M. E., Radom-Aizik, S., Rasmussen, B. B., Ravussin, E., Serra, M., Stowe, C. L., Trappe, S., AbouAssi, H., Adkins, J. N., Alekel, D. L., Ashley, E., Bamman, M. M., Bergman, B. C., Bessesen, D. H., Broskey, N. T., Buford, T. W., Burant, C. F., Chen, H., Christle, J. W., Clish, C. B., Coen, P. M., Collier, D., Collins, K. A., Cooper, D. M., Cortes, T., Cutter, G. R., Dubis, G., Fernandez, F. M., Firnhaber, J., Forman, D. E., Gaul, D. A., Gay, N., Gerszten, R. E., Goodpaster, B. H., Gritsenko, M. A., Haddad, F., Huffman, K. M., Ilkayeva, O., Jankowski, C. M., Jin, C., Johannsen, N. M., Johnson, J., Kelly, L., Kershaw, E., Kraus, W. E., Laughlin, M., Lester, B., Lindholm, M. E., Lowe, A., Lu, C. J., McGowan, J., Melanson, E. L., Montgomery, S., Moore, S. G., Moreau, K. L., Muehlbauer, M., Musi, N., Nair, V. D., Newgard, C. B., Newman, A. B., Nicklas, B., Nindle, B. C., Ormond, K., Piehowski, P. D., Qian, W. J., Rankinen, T., Rejeski, W. J., Robbins, J., Rogers, R. J., Rooney, J. L., Rushing, S., Sanford, J. A., Schauer, I. E., Schwartz, R. S., Sealfon, S. C., Slentz, C., Sloan, R., Smith, K. S., Snyder, M., Spahn, J., Sparks, L. M., Stefanovic-Racic, M., Tanner, C. J., Thalacker-Mercer, A., Tracy, R., Trappe, T. A., Volpi, E., Walsh, M. J., Wheeler, M. T., Willis, L. H. 2024

Abstract

Physical activity, including structured exercise, is associated with favorable health-related chronic disease outcomes. While there is evidence of various molecular pathways that affect these responses, a comprehensive molecular map of these molecular responses to exercise has not been developed. The Molecular Transducers of Physical Activity Consortium (MoTrPAC) is a multi-center study designed to isolate the effects of structured exercise training on the molecular mechanisms underlying the health benefits of exercise and physical activity. MoTrPAC contains both a pre-clinical and human component. The details of the human studies component of MoTrPAC that include the design and methods are presented here. The human studies contain both an adult and pediatric component. In the adult component, sedentary participants are randomized to 12 weeks of Control, Endurance Exercise Training, or Resistance Exercise Training with outcomes measures completed before and following the 12 weeks. The adult component also includes recruitment of highly active endurance trained or resistance trained participants who only complete measures once. A similar design is used for the pediatric component; however, only endurance exercise is examined. Phenotyping measures include weight, body composition, vital signs, cardiorespiratory fitness, muscular strength, physical activity and diet, and other questionnaires. Participants also complete an acute rest period (adults only) or exercise session (adults, pediatrics) with collection of biospecimens (blood only for pediatrics) to allow for examination of the molecular responses. The design and methods of MoTrPAC may inform other studies. Moreover, MoTrPAC will provide a repository of data that can be used broadly across the scientific community.

View details for DOI 10.1152/japplphysiol.00102.2024

View details for PubMedID 38634503
The mitochondrial multi-omic response to exercise training across rat tissues. Cell metabolism Amar, D., Gay, N. R., Jimenez-Morales, D., Jean Beltran, P. M., Ramaker, M. E., Raja, A. N., Zhao, B., Sun, Y., Marwaha, S., Gaul, D. A., Hershman, S. G., Ferrasse, A., Xia, A., Lanza, I., Fernández, F. M., Montgomery, S. B., Hevener, A. L., Ashley, E. A., Walsh, M. J., Sparks, L. M., Burant, C. F., Rector, R. S., Thyfault, J., Wheeler, M. T., Goodpaster, B. H., Coen, P. M., Schenk, S., Bodine, S. C., Lindholm, M. E. 2024

Abstract

Mitochondria have diverse functions critical to whole-body metabolic homeostasis. Endurance training alters mitochondrial activity, but systematic characterization of these adaptations is lacking. Here, the Molecular Transducers of Physical Activity Consortium mapped the temporal, multi-omic changes in mitochondrial analytes across 19 tissues in male and female rats trained for 1, 2, 4, or 8 weeks. Training elicited substantial changes in the adrenal gland, brown adipose, colon, heart, and skeletal muscle. The colon showed non-linear response dynamics, whereas mitochondrial pathways were downregulated in brown adipose and adrenal tissues. Protein acetylation increased in the liver, with a shift in lipid metabolism, whereas oxidative proteins increased in striated muscles. Exercise-upregulated networks were downregulated in human diabetes and cirrhosis. Knockdown of the central network protein 17-beta-hydroxysteroid dehydrogenase 10 (HSD17B10) elevated oxygen consumption, indicative of metabolic stress. We provide a multi-omic, multi-tissue, temporal atlas of the mitochondrial response to exercise training and identify candidates linked to mitochondrial dysfunction.

View details for DOI 10.1016/j.cmet.2023.12.021

View details for PubMedID 38701776
De novo variants in the non-coding spliceosomal snRNA gene RNU4-2 are a frequent cause of syndromic neurodevelopmental disorders. medRxiv : the preprint server for health sciences Chen, Y., Dawes, R., Kim, H. C., Stenton, S. L., Walker, S., Ljungdahl, A., Lord, J., Ganesh, V. S., Ma, J., Martin-Geary, A. C., Lemire, G., D'Souza, E. N., Dong, S., Ellingford, J. M., Adams, D. R., Allan, K., Bakshi, M., Baldwin, E. E., Berger, S. I., Bernstein, J. A., Brown, N. J., Burrage, L. C., Chapman, K., Compton, A. G., Cunningham, C. A., D'Souza, P., Délot, E. C., Dias, K. R., Elias, E. R., Evans, C. A., Ewans, L., Ezell, K., Fraser, J. L., Gallacher, L., Genetti, C. A., Grant, C. L., Haack, T., Kuechler, A., Lalani, S. R., Leitão, E., Fevre, A. L., Leventer, R. J., Liebelt, J. E., Lockhart, P. J., Ma, A. S., Macnamara, E. F., Maurer, T. M., Mendez, H. R., Montgomery, S. B., Nassogne, M. C., Neumann, S., O'Leary, M., Palmer, E. E., Phillips, J., Pitsava, G., Pysar, R., Rehm, H. L., Reuter, C. M., Revencu, N., Riess, A., Rius, R., Rodan, L., Roscioli, T., Rosenfeld, J. A., Sachdev, R., Simons, C., Sisodiya, S. M., Snell, P., Clair, L., Stark, Z., Tan, T. Y., Tan, N. B., Temple, S. E., Thorburn, D. R., Tifft, C. J., Uebergang, E., VanNoy, G. E., Vilain, E., Viskochil, D. H., Wedd, L., Wheeler, M. T., White, S. M., Wojcik, M., Wolfe, L. A., Wolfenson, Z., Xiao, C., Zocche, D., Rubenstein, J. L., Markenscoff-Papadimitriou, E., Fica, S. M., Baralle, D., Depienne, C., MacArthur, D. G., Howson, J. M., Sanders, S. J., O'Donnell-Luria, A., Whiffin, N. 2024

Abstract

Around 60% of individuals with neurodevelopmental disorders (NDD) remain undiagnosed after comprehensive genetic testing, primarily of protein-coding genes1. Increasingly, large genome-sequenced cohorts are improving our ability to discover new diagnoses in the non-coding genome. Here, we identify the non-coding RNA RNU4-2 as a novel syndromic NDD gene. RNU4-2 encodes the U4 small nuclear RNA (snRNA), which is a critical component of the U4/U6.U5 tri-snRNP complex of the major spliceosome2. We identify an 18 bp region of RNU4-2 mapping to two structural elements in the U4/U6 snRNA duplex (the T-loop and Stem III) that is severely depleted of variation in the general population, but in which we identify heterozygous variants in 119 individuals with NDD. The vast majority of individuals (77.3%) have the same highly recurrent single base-pair insertion (n.64_65insT). We estimate that variants in this region explain 0.41% of individuals with NDD. We demonstrate that RNU4-2 is highly expressed in the developing human brain, in contrast to its contiguous counterpart RNU4-1 and other U4 homologs, supporting RNU4-2's role as the primary U4 transcript in the brain. Overall, this work underscores the importance of non-coding genes in rare disorders. It will provide a diagnosis to thousands of individuals with NDD worldwide and pave the way for the development of effective treatments for these individuals.

View details for DOI 10.1101/2024.04.07.24305438

View details for PubMedID 38645094

View details for PubMedCentralID PMC11030480
Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. medRxiv : the preprint server for health sciences Jensen, T. D., Ni, B., Reuter, C. M., Gorzynski, J. E., Fazal, S., Bonner, D., Ungar, R. A., Goddard, P. C., Raja, A., Ashley, E. A., Bernstein, J. A., Zuchner, S., Greicius, M. D., Montgomery, S. B., Schatz, M. C., Wheeler, M. T., Battle, A. 2024

Abstract

Rare structural variants (SVs) - insertions, deletions, and complex rearrangements - can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don't incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.

View details for DOI 10.1101/2024.03.22.24304565

View details for PubMedID 38585781

View details for PubMedCentralID PMC10996727
Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation. medRxiv : the preprint server for health sciences Gustafson, J. A., Gibson, S. B., Damaraju, N., Zalusky, M. P., Hoekzema, K., Twesigomwe, D., Yang, L., Snead, A. A., Richmond, P. A., De Coster, W., Olson, N. D., Guarracino, A., Li, Q., Miller, A. L., Goffena, J., Anderson, Z., Storz, S. H., Ward, S. A., Sinha, M., Gonzaga-Jauregui, C., Clarke, W. E., Basile, A. O., Corvelo, A., Reeves, C., Helland, A., Musunuri, R. L., Revsine, M., Patterson, K. E., Paschal, C. R., Zakarian, C., Goodwin, S., Jensen, T. D., Robb, E., McCombie, W. R., Sedlazeck, F. J., Zook, J. M., Montgomery, S. B., Garrison, E., Kolmogorov, M., Schatz, M. C., McLaughlin, R. N., Dashnow, H., Zody, M. C., Loose, M., Jain, M., Eichler, E. E., Miller, D. E. 2024

Abstract

Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

View details for DOI 10.1101/2024.03.05.24303792

View details for PubMedID 38496498

View details for PubMedCentralID PMC10942501
RNA Sequencing in Disease Diagnosis. Annual review of genomics and human genetics Smail, C., Montgomery, S. B. 2024

Abstract

RNA sequencing (RNA-seq) enables the accurate measurement of multiple transcriptomic phenotypes for modeling the impacts of disease variants. Advances in technologies, experimental protocols, and analysis strategies are rapidly expanding the application of RNA-seq to identify disease biomarkers, tissue- and cell-type-specific impacts, and the spatial localization of disease-associated mechanisms. Ongoing international efforts to construct biobank-scale transcriptomic repositories with matched genomic data across diverse population groups are further increasing the utility of RNA-seq approaches by providing large-scale normative reference resources. The availability of these resources, combined with improved computational analysis pipelines, has enabled the detection of aberrant transcriptomic phenotypes underlying rare diseases. Further expansion of these resources, across both somatic and developmental tissues, is expected to soon provide unprecedented insights to resolve disease origin, mechanism of action, and causal gene contributions, suggesting the continued high utility of RNA-seq in disease diagnosis. Expected final online publication date for the Annual Review of Genomics and Human Genetics, Volume 25 is August 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

View details for DOI 10.1146/annurev-genom-021623-121812

View details for PubMedID 38360541
Impact of genome build on RNA-seq interpretation and diagnostics. medRxiv : the preprint server for health sciences Ungar, R. A., Goddard, P. C., Jensen, T. D., Degalez, F., Smith, K. S., Jin, C. A., Bonner, D. E., Bernstein, J. A., Wheeler, M. T., Montgomery, S. B. 2024

Abstract

Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network (UDN) and Genomics Research to Elucidate the Genetics of Rare Disease (GREGoR) Consortium. We identified 2,800 genes with build-dependent quantification across six routinely-collected biospecimens, including 1,391 protein-coding genes and 341 known rare disease genes. We further observed multiple genes that only have detectable expression in a subset of genome builds. Finally, we characterized how genome build impacts the detection of outlier transcriptomic events. Combined, we provide a database of genes impacted by build choice, and recommend that transcriptomics-guided analyses and diagnoses are cross-referenced with these data for robustness.

View details for DOI 10.1101/2024.01.11.24301165

View details for PubMedID 38260490

View details for PubMedCentralID PMC10802764
Detection and analysis of complex structural variation in human genomes across populations and in brains of donors with psychiatric disorders Cell Zhou, B., Arthur, J. G., Guo, H., et al 2024; Published online September 30, 2024

View details for DOI 10.1016/j.cell.2024.09.014
Genetic architecture of cardiac dynamic flow volumes. Nature genetics Gomes, B., Singh, A., O'Sullivan, J. W., Schnurr, T. M., Goddard, P. C., Loong, S., Amar, D., Hughes, J. W., Kostur, M., Haddad, F., Salerno, M., Foo, R., Montgomery, S. B., Parikh, V. N., Meder, B., Ashley, E. A. 2023

Abstract

Cardiac blood flow is a critical determinant of human health. However, the definition of its genetic architecture is limited by the technical challenge of capturing dynamic flow volumes from cardiac imaging at scale. We present DeepFlow, a deep-learning system to extract cardiac flow and volumes from phase-contrast cardiac magnetic resonance imaging. A mixed-linear model applied to 37,653 individuals from the UK Biobank reveals genome-wide significant associations across cardiac dynamic flow volumes spanning from aortic forward velocity to aortic regurgitation fraction. Mendelian randomization reveals a causal role for aortic root size in aortic valve regurgitation. Among the most significant contributing variants, localizing genes (near ELN, PRDM6 and ADAMTS7) are implicated in connective tissue and blood pressure pathways. Here we show that DeepFlow cardiac flow phenotyping at scale, combined with genotyping data, reinforces the contribution of connective tissue genes, blood pressure and root size to aortic valve function.

View details for DOI 10.1038/s41588-023-01587-5

View details for PubMedID 38082205

View details for PubMedCentralID 7612636
Organ aging signatures in the plasma proteome track health and disease. Nature Oh, H. S., Rutledge, J., Nachun, D., Pálovics, R., Abiose, O., Moran-Losada, P., Channappa, D., Urey, D. Y., Kim, K., Sung, Y. J., Wang, L., Timsina, J., Western, D., Liu, M., Kohlfeld, P., Budde, J., Wilson, E. N., Guen, Y., Maurer, T. M., Haney, M., Yang, A. C., He, Z., Greicius, M. D., Andreasson, K. I., Sathyan, S., Weiss, E. F., Milman, S., Barzilai, N., Cruchaga, C., Wagner, A. D., Mormino, E., Lehallier, B., Henderson, V. W., Longo, F. M., Montgomery, S. B., Wyss-Coray, T. 2023; 624 (7990): 164-172

Abstract

Animal studies show aging varies between individuals as well as between organs within an individual1-4, but whether this is true in humans and its effect on age-related diseases is unknown. We utilized levels of human blood plasma proteins originating from specific organs to measure organ-specific aging differences in living individuals. Using machine learning models, we analysed aging in 11 major organs and estimated organ age reproducibly in five independent cohorts encompassing 5,676 adults across the human lifespan. We discovered nearly 20% of the population show strongly accelerated age in one organ and 1.7% are multi-organ agers. Accelerated organ aging confers 20-50% higher mortality risk, and organ-specific diseases relate to faster aging of those organs. We find individuals with accelerated heart aging have a 250% increased heart failure risk and accelerated brain and vascular aging predict Alzheimer's disease (AD) progression independently from and as strongly as plasma pTau-181 (ref. 5), the current best blood-based biomarker for AD. Our models link vascular calcification, extracellular matrix alterations and synaptic protein shedding to early cognitive decline. We introduce a simple and interpretable method to study organ aging using plasma proteomics data, predicting diseases and aging effects.

View details for DOI 10.1038/s41586-023-06802-1

View details for PubMedID 38057571

View details for PubMedCentralID PMC10700136
Transcriptomics and chromatin accessibility in multiple African population samples. bioRxiv : the preprint server for biology DeGorter, M. K., Goddard, P. C., Karakoc, E., Kundu, S., Yan, S. M., Nachun, D., Abell, N., Aguirre, M., Carstensen, T., Chen, Z., Durrant, M., Dwaracherla, V. R., Feng, K., Gloudemans, M. J., Hunter, N., Moorthy, M. P., Pomilla, C., Rodrigues, K. B., Smith, C. J., Smith, K. S., Ungar, R. A., Balliu, B., Fellay, J., Flicek, P., McLaren, P. J., Henn, B., McCoy, R. C., Sugden, L., Kundaje, A., Sandhu, M. S., Gurdasani, D., Montgomery, S. B. 2023

Abstract

Mapping the functional human genome and impact of genetic variants is often limited to European-descendent population samples. To aid in overcoming this limitation, we measured gene expression using RNA sequencing in lymphoblastoid cell lines (LCLs) from 599 individuals from six African populations to identify novel transcripts including those not represented in the hg38 reference genome. We used whole genomes from the 1000 Genomes Project and 164 Maasai individuals to identify 8,881 expression and 6,949 splicing quantitative trait loci (eQTLs/sQTLs), and 2,611 structural variants associated with gene expression (SV-eQTLs). We further profiled chromatin accessibility using ATAC-Seq in a subset of 100 representative individuals, to identity chromatin accessibility quantitative trait loci (caQTLs) and allele-specific chromatin accessibility, and provide predictions for the functional effect of 78.9 million variants on chromatin accessibility. Using this map of eQTLs and caQTLs we fine-mapped GWAS signals for a range of complex diseases. Combined, this work expands global functional genomic data to identify novel transcripts, functional elements and variants, understand population genetic history of molecular quantitative trait loci, and further resolve the genetic basis of multiple human traits and disease.

View details for DOI 10.1101/2023.11.04.564839

View details for PubMedID 37986808

View details for PubMedCentralID PMC10659267
Multi- Omic Profiling of Macrophages Lacking Tet2 or Dnmt3a Reveals Mechanisms of Hyper-Inflammation in Clonal Hematopoiesis Rodrigues, K. B., Gopakumar, J., Weng, Z., Mitchell, S., Maurer, M., Nachun, D., Eulalio, T., Estrada, D., Mazumder, T., Ma, L., Montgomery, S., Jaiswal, S. AMER SOC HEMATOLOGY. 2023

View details for DOI 10.1182/blood-2023-187890

View details for Web of Science ID 001159306704186
Integrative analyses highlight functional regulatory variants associated with neuropsychiatric diseases. Nature genetics Guo, M. G., Reynolds, D. L., Ang, C. E., Liu, Y., Zhao, Y., Donohue, L. K., Siprashvili, Z., Yang, X., Yoo, Y., Mondal, S., Hong, A., Kain, J., Meservey, L., Fabo, T., Elfaki, I., Kellman, L. N., Abell, N. S., Pershad, Y., Bayat, V., Etminani, P., Holodniy, M., Geschwind, D. H., Montgomery, S. B., Duncan, L. E., Urban, A. E., Altman, R. B., Wernig, M., Khavari, P. A. 2023

Abstract

Noncoding variants of presumed regulatory function contribute to the heritability of neuropsychiatric disease. A total of 2,221 noncoding variants connected to risk for ten neuropsychiatric disorders, including autism spectrum disorder, attention deficit hyperactivity disorder, bipolar disorder, borderline personality disorder, major depression, generalized anxiety disorder, panic disorder, post-traumatic stress disorder, obsessive-compulsive disorder and schizophrenia, were studied in developing human neural cells. Integrating epigenomic and transcriptomic data with massively parallel reporter assays identified differentially-active single-nucleotide variants (daSNVs) in specific neural cell types. Expression-gene mapping, network analyses and chromatin looping nominated candidate disease-relevant target genes modulated by these daSNVs. Follow-up integration of daSNV gene editing with clinical cohort analyses suggested that magnesium transport dysfunction may increase neuropsychiatric disease risk and indicated that common genetic pathomechanisms may mediate specific symptoms that are shared across multiple neuropsychiatric diseases.

View details for DOI 10.1038/s41588-023-01533-5

View details for PubMedID 37857935

View details for PubMedCentralID 4112379
The functional impact of rare variation across the regulatory cascade. Cell genomics Li, T., Ferraro, N., Strober, B. J., Aguet, F., Kasela, S., Arvanitis, M., Ni, B., Wiel, L., Hershberg, E., Ardlie, K., Arking, D. E., Beer, R. L., Brody, J., Blackwell, T. W., Clish, C., Gabriel, S., Gerszten, R., Guo, X., Gupta, N., Johnson, W. C., Lappalainen, T., Lin, H. J., Liu, Y., Nickerson, D. A., Papanicolaou, G., Pritchard, J. K., Qasba, P., Shojaie, A., Smith, J., Sotoodehnia, N., Taylor, K. D., Tracy, R. P., Van Den Berg, D., Wheeler, M. T., Rich, S. S., Rotter, J. I., Battle, A., Montgomery, S. B. 2023; 3 (10): 100401

Abstract

Each human genome has tens of thousands of rare genetic variants; however, identifying impactful rare variants remains a major challenge. We demonstrate how use of personal multi-omics can enable identification of impactful rare variants by using the Multi-Ethnic Study of Atherosclerosis, which included several hundred individuals, with whole-genome sequencing, transcriptomes, methylomes, and proteomes collected across two time points, 10 years apart. We evaluated each multi-omics phenotype's ability to separately and jointly inform functional rare variation. By combining expression and protein data, we observed rare stop variants 62 times and rare frameshift variants 216 times as frequently as controls, compared to 13-27 times as frequently for expression or protein effects alone. We extended a Bayesian hierarchical model, "Watershed," to prioritize specific rare variants underlying multi-omics signals across the regulatory cascade. With this approach, we identified rare variants that exhibited large effect sizes on multiple complex traits including height, schizophrenia, and Alzheimer's disease.

View details for DOI 10.1016/j.xgen.2023.100401

View details for PubMedID 37868038

View details for PubMedCentralID PMC10589633
Integrated single-cell multiome analysis reveals muscle fiber-type gene regulatory circuitry modulated by endurance exercise. bioRxiv : the preprint server for biology Rubenstein, A. B., Smith, G. R., Zhang, Z., Chen, X., Chambers, T. L., Ruf-Zamojski, F., Mendelev, N., Cheng, W. S., Zamojski, M., Amper, M. A., Nair, V. D., Marderstein, A. R., Montgomery, S. B., Troyanskaya, O. G., Zaslavsky, E., Trappe, T., Trappe, S., Sealfon, S. C. 2023

Abstract

Endurance exercise is an important health modifier. We studied cell-type specific adaptations of human skeletal muscle to acute endurance exercise using single-nucleus (sn) multiome sequencing in human vastus lateralis samples collected before and 3.5 hours after 40 min exercise at 70% VO2max in four subjects, as well as in matched time of day samples from two supine resting circadian controls. High quality same-cell RNA-seq and ATAC-seq data were obtained from 37,154 nuclei comprising 14 cell types. Among muscle fiber types, both shared and fiber-type specific regulatory programs were identified. Single-cell circuit analysis identified distinct adaptations in fast, slow and intermediate fibers as well as LUM-expressing FAP cells, involving a total of 328 transcription factors (TFs) acting at altered accessibility sites regulating 2,025 genes. These data and circuit mapping provide single-cell insight into the processes underlying tissue and metabolic remodeling responses to exercise.

View details for DOI 10.1101/2023.09.26.558914

View details for PubMedID 37808658

View details for PubMedCentralID PMC10557702
Author Correction: Africa-specific human genetic variation near CHD1L associates with HIV-1 load. Nature McLaren, P. J., Porreca, I., Iaconis, G., Mok, H. P., Mukhopadhyay, S., Karakoc, E., Cristinelli, S., Pomilla, C., Bartha, I., Thorball, C. W., Tough, R. H., Angelino, P., Kiar, C. S., Carstensen, T., Fatumo, S., Porter, T., Jarvis, I., Skarnes, W. C., Bassett, A., DeGorter, M. K., Sathya Moorthy, M. P., Tuff, J. F., Kim, E. Y., Walter, M., Simons, L. M., Bashirova, A., Buchbinder, S., Carrington, M., Cossarizza, A., De Luca, A., Goedert, J. J., Goldstein, D. B., Haas, D. W., Herbeck, J. T., Johnson, E. O., Kaleebu, P., Kilembe, W., Kirk, G. D., Kootstra, N. A., Kral, A. H., Lambotte, O., Luo, M., Mallal, S., Martinez-Picado, J., Meyer, L., Miro, J. M., Moodley, P., Motala, A. A., Mullins, J. I., Nam, K., Obel, N., Pirie, F., Plummer, F. A., Poli, G., Price, M. A., Rauch, A., Theodorou, I., Trkola, A., Walker, B. D., Winkler, C. A., Zagury, J. F., Montgomery, S. B., Ciuffi, A., Hultquist, J. F., Wolinsky, S. M., Dougan, G., Lever, A. M., Gurdasani, D., Groom, H., Sandhu, M. S., Fellay, J. 2023

View details for DOI 10.1038/s41586-023-06591-7

View details for PubMedID 37670157
Beyond the exome: What's next in diagnostic testing for Mendelian conditions. American journal of human genetics Wojcik, M. H., Reuter, C. M., Marwaha, S., Mahmoud, M., Duyzend, M. H., Barseghyan, H., Yuan, B., Boone, P. M., Groopman, E. E., Délot, E. C., Jain, D., Sanchis-Juan, A., Starita, L. M., Talkowski, M., Montgomery, S. B., Bamshad, M. J., Chong, J. X., Wheeler, M. T., Berger, S. I., O'Donnell-Luria, A., Sedlazeck, F. J., Miller, D. E. 2023; 110 (8): 1229-1248

Abstract

Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order, and emerging technologies, such as optical genome mapping and long-read DNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to research consortia focused on elucidating the underlying cause of rare unsolved genetic disorders.

View details for DOI 10.1016/j.ajhg.2023.06.009

View details for PubMedID 37541186
Africa-specific human genetic variation near CHD1L associates with HIV-1 load. Nature McLaren, P. J., Porreca, I., Iaconis, G., Mok, H. P., Mukhopadhyay, S., Karakoc, E., Cristinelli, S., Pomilla, C., Bartha, I., Thorball, C. W., Tough, R. H., Angelino, P., Kiar, C. S., Carstensen, T., Fatumo, S., Porter, T., Jarvis, I., Skarnes, W. C., Bassett, A., DeGorter, M. K., Sathya Moorthy, M. P., Tuff, J. F., Kim, E. Y., Walter, M., Simons, L. M., Bashirova, A., Buchbinder, S., Carrington, M., Cossarizza, A., De Luca, A., Goedert, J. J., Goldstein, D. B., Haas, D. W., Herbeck, J. T., Johnson, E. O., Kaleebu, P., Kilembe, W., Kirk, G. D., Kootstra, N. A., Kral, A. H., Lambotte, O., Luo, M., Mallal, S., Martinez-Picado, J., Meyer, L., Miro, J. M., Moodley, P., Motala, A. A., Mullins, J. I., Nam, K., Obel, N., Pirie, F., Plummer, F. A., Poli, G., Price, M. A., Rauch, A., Theodorou, I., Trkola, A., Walker, B. D., Winkler, C. A., Zagury, J. F., Montgomery, S. B., Ciuffi, A., Hultquist, J. F., Wolinsky, S. M., Dougan, G., Lever, A. M., Gurdasani, D., Groom, H., Sandhu, M. S., Fellay, J. 2023

Abstract

HIV-1 remains a global health crisis1, highlighting the need to identify new targets for therapies. Here, given the disproportionate HIV-1 burden and marked human genome diversity in Africa2, we assessed the genetic determinants of control of set-point viral load in 3,879 people of African ancestries living with HIV-1 participating in the international collaboration for the genomics of HIV3. We identify a previously undescribed association signal on chromosome 1 where the peak variant associates with an approximately 0.3 log10-transformed copies per ml lower set-point viral load per minor allele copy and is specific to populations of African descent. The top associated variant is intergenic and lies between a long intergenic non-coding RNA (LINC00624) and the coding gene CHD1L, which encodes a helicase that is involved in DNA repair4. Infection assays in iPS cell-derived macrophages and other immortalized cell lines showed increased HIV-1 replication in CHD1L-knockdown and CHD1L-knockout cells. We provide evidence from population genetic studies that Africa-specific genetic variation near CHD1L associates with HIV replication in vivo. Although experimental studies suggest that CHD1L is able to limit HIV infection in some cell types in vitro, further investigation is required to understand the mechanisms underlying our observations, including any potential indirect effects of CHD1L on HIV spread in vivo that our cell-based assays cannot recapitulate.

View details for DOI 10.1038/s41586-023-06370-4

View details for PubMedID 37532928

View details for PubMedCentralID 3723635
Molecular quantitative trait loci NATURE REVIEWS METHODS PRIMERS Aguet, F., Alasoo, K., Li, Y., Battle, A., Im, H., Montgomery, S. B., Lappalainen, T. 2023; 3 (1)

View details for DOI 10.1038/s43586-022-00188-6

View details for Web of Science ID 000922834900001
Beyond the exome: what's next in diagnostic testing for Mendelian conditions. ArXiv Wojcik, M. H., Reuter, C. M., Marwaha, S., Mahmoud, M., Duyzend, M. H., Barseghyan, H., Yuan, B., Boone, P. M., Groopman, E. E., Délot, E. C., Jain, D., Sanchis-Juan, A., Starita, L. M., Talkowski, M., Montgomery, S. B., Bamshad, M. J., Chong, J. X., Wheeler, M. T., Berger, S. I., O'Donnell-Luria, A., Sedlazeck, F. J., Miller, D. E. 2023

Abstract

Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order and emerging technologies, such as optical genome mapping and long-read DNA or RNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to a consortium such as GREGoR, which is focused on elucidating the underlying cause of rare unsolved genetic disorders.

View details for DOI 10.1002/ajmg.a.63053

View details for PubMedID 36713248

View details for PubMedCentralID PMC9882576
The mitochondrial multi-omic response to exercise training across tissues. bioRxiv : the preprint server for biology Amar, D., Gay, N. R., Jimenez-Morales, D., Beltran, P. M., Ramaker, M. E., Raja, A. N., Zhao, B., Sun, Y., Marwaha, S., Gaul, D., Hershman, S. G., Xia, A., Lanza, I., Fernandez, F. M., Montgomery, S. B., Hevener, A. L., Ashley, E. A., Walsh, M. J., Sparks, L. M., Burant, C. F., Rector, R. S., Thyfault, J., Wheeler, M. T., Goodpaster, B. H., Coen, P. M., Schenk, S., Bodine, S. C., Lindholm, M. E. 2023

Abstract

Mitochondria are adaptable organelles with diverse cellular functions critical to whole-body metabolic homeostasis. While chronic endurance exercise training is known to alter mitochondrial activity, these adaptations have not yet been systematically characterized. Here, the Molecular Transducers of Physical Activity Consortium (MoTrPAC) mapped the longitudinal, multi-omic changes in mitochondrial analytes across 19 tissues in male and female rats endurance trained for 1, 2, 4 or 8 weeks. Training elicited substantial changes in the adrenal gland, brown adipose, colon, heart and skeletal muscle, while we detected mild responses in the brain, lung, small intestine and testes. The colon response was characterized by non-linear dynamics that resulted in upregulation of mitochondrial function that was more prominent in females. Brown adipose and adrenal tissues were characterized by substantial downregulation of mitochondrial pathways. Training induced a previously unrecognized robust upregulation of mitochondrial protein abundance and acetylation in the liver, and a concomitant shift in lipid metabolism. The striated muscles demonstrated a highly coordinated response to increase oxidative capacity, with the majority of changes occurring in protein abundance and post-translational modifications. We identified exercise upregulated networks that are downregulated in human type 2 diabetes and liver cirrhosis. In both cases HSD17B10, a central dehydrogenase in multiple metabolic pathways and mitochondrial tRNA maturation, was the main hub. In summary, we provide a multi-omic, cross-tissue atlas of the mitochondrial response to training and identify candidates for prevention of disease-associated mitochondrial dysfunction.

View details for DOI 10.1101/2023.01.13.523698

View details for PubMedID 36711881

View details for PubMedCentralID PMC9882193
Multiomic identification of key transcriptional regulatory programs during endurance exercise training. bioRxiv : the preprint server for biology Smith, G. R., Zhao, B., Lindholm, M. E., Raja, A., Viggars, M., Pincas, H., Gay, N. R., Sun, Y., Ge, Y., Nair, V. D., Sanford, J. A., S Amper, M. A., Vasoya, M., Smith, K. S., Montgomery, S., Zaslavsky, E., Bodine, S. C., Esser, K. A., Walsh, M. J., Snyder, M. P., Sealfon, S. C., MoTrPAC Study Group 2023

Abstract

Transcription factors (TFs) play a key role in regulating gene expression and responses to stimuli. We conducted an integrated analysis of chromatin accessibility and RNA expression across various rat tissues following endurance exercise training (EET) to map epigenomic changes to transcriptional changes and determine key TFs involved. We uncovered tissue-specific changes across both omic layers, including highly correlated differentially accessible regions (DARs) and differentially expressed genes (DEGs). We identified open chromatin regions associated with DEGs (DEGaPs) and found tissue-specific and genomic feature-specific TF motif enrichment patterns among both DARs and DEGaPs. Accessible promoters of up-vs. down-regulated DEGs per tissue showed distinct TF enrichment patterns. Further, some EET-induced TFs in skeletal muscle were either validated at the proteomic level (MEF2C and NUR77) or correlated with exercise-related phenotypic changes. We provide an in-depth analysis of the epigenetic and trans-factor-dependent processes governing gene expression during EET.

View details for DOI 10.1101/2023.01.10.523450

View details for PubMedID 36711841
RNAget: an API to securely retrieve RNA quantifications. Bioinformatics (Oxford, England) Upchurch, S., Palumbo, E., Adams, J., Bujold, D., Bourque, G., Nedzel, J., Graham, K., Kagda, M. S., Assis, P., Hitz, B., Righi, E., Guigo, R., Wold, B. J., GA4GH RNA-Seq Task Team, Adams, J., Brazma, A., Bujold, D., Burchard, J., Capka, J., Cherry, M., Clarke, L., Craft, B., Dermitzakis, M., Diekhans, M., Dursi, J., Fitzsimons, M. S., Flaming, Z., Garrido, R., Gil, A., Godden, P., Green, M., Guigo, R., Guttman, M., Haas, B., Haeussler, M., Hitz, B., Li, B., Linnarsson, S., Lipski, A., Liu, D., Longerich, S., Lougheed, D., Manning, J., Marioni, J., Meyer, C., Montgomery, S., Morrow, A., Munoz-Power Fuentes, A., Nedzel, J., Nguyen, D., Osborn, K., Ouellette, F., Palumbo, E., Papatheodorou, I., Pervouchine, D., Ramani, A., Rambla, J., Sadjad, B., Steinberg, D., Talkar, J., Tickle, T., Tzeng, K., Upchurch, S., Vaisipour, S., Watford, S., Wold, B., Zhang, Z., Zhu, J. 2023; 39 (4)

Abstract

SUMMARY: Large-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health project, we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expression matrix-format data, including RNA sequencing and microarrays. Further, it generalizes to quantification matrices of other sequence-based genomics such as ATAC-seq and ChIP-seq.AVAILABILITY AND IMPLEMENTATION: https://ga4gh-rnaseq.github.io/schema/docs/index.html.

View details for DOI 10.1093/bioinformatics/btad126

View details for PubMedID 36897015
Methylation differences in Alzheimer's disease neuropathologic change in the aged human brain. Acta neuropathologica communications Lang, A. L., Eulalio, T., Fox, E., Yakabi, K., Bukhari, S. A., Kawas, C. H., Corrada, M. M., Montgomery, S. B., Heppner, F. L., Capper, D., Nachun, D., Montine, T. J. 2022; 10 (1): 174

Abstract

Alzheimer's disease (AD) is the most common cause of dementia with advancing age as its strongest risk factor. AD neuropathologic change (ADNC) is known to be associated with numerous DNA methylation changes in the human brain, but the oldest old (> 90 years) have so far been underrepresented in epigenetic studies of ADNC. Our study participants were individuals aged over 90 years (n = 47) from The 90+ Study. We analyzed DNA methylation from bulk samples in eight precisely dissected regions of the human brain: middle frontal gyrus, cingulate gyrus, entorhinal cortex, dentate gyrus, CA1, substantia nigra, locus coeruleus and cerebellar cortex. We deconvolved our bulk data into cell-type-specific (CTS) signals using computational methods. CTS methylation differences were analyzed across different levels of ADNC. The highest amount of ADNC related methylation differences was found in the dentate gyrus, a region that has so far been underrepresented in large scale multi-omic studies. In neurons of the dentate gyrus, DNA methylation significantly differed with increased burden of amyloid beta (Aβ) plaques at 5897 promoter regions of protein-coding genes. Amongst these, higher Aβ plaque burden was associated with promoter hypomethylation of the Presenilin enhancer 2 (PEN-2) gene, one of the rate limiting genes in the formation of gamma-secretase, a multicomponent complex that is responsible in part for the endoproteolytic cleavage of amyloid precursor protein into Aβ peptides. In addition to novel ADNC related DNA methylation changes, we present the most detailed array-based methylation survey of the old aged human brain to date. Our open-sourced dataset can serve as a brain region reference panel for future studies and help advance research in aging and neurodegenerative diseases.

View details for DOI 10.1186/s40478-022-01470-0

View details for PubMedID 36447297

View details for PubMedCentralID PMC9710143
Deep learning-assisted genome-wide characterization of massively parallel reporter assays. Nucleic acids research Lu, F., Sossin, A., Abell, N., Montgomery, S. B., He, Z. 2022

Abstract

Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human genome limits our understanding of the regulatory effects they detect. To address this, we develop a deep learning model, MpraNet, to distinguish potential MPRA targets from the background genome. This model achieves high discriminative performance (AUROC=0.85) at differentiating MPRA positives from a set of control variants that mimic the background genome when applied to the lymphoblastoid cell line. We observe that existing functional scores represent very distinct functional effects, and most of them fail to characterize the regulatory effect that MPRA detects. Using MpraNet, we predict potential MPRA functional variants across the genome and identify the distributions of MPRA effect relative to other characteristics of genetic variation, including allele frequency, alternative functional annotations specified by FAVOR, and phenome-wide associations. We also observed that the predicted MPRA positives are not uniformly distributed across the genome; instead, they are clumped together in active regions comprising 9.95% of the genome and inactive regions comprising 89.07% of the genome. Furthermore, we propose our model as a screen to filter MPRA experiment candidates at genome-wide scale, enabling future experiments to be more cost-efficient by increasing precision relative to that observed from previous MPRAs.

View details for DOI 10.1093/nar/gkac990

View details for PubMedID 36350674
RNA editing underlies genetic risk of common inflammatory diseases. Nature Li, Q., Gloudemans, M. J., Geisinger, J. M., Fan, B., Aguet, F., Sun, T., Ramaswami, G., Li, Y. I., Ma, J. B., Pritchard, J. K., Montgomery, S. B., Li, J. B. 2022

Abstract

A major challenge in human genetics is to identify the molecular mechanisms of trait-associated and disease-associated variants. To achieve this, quantitative trait locus (QTL) mapping of genetic variants with intermediate molecular phenotypes such as gene expression and splicing have been widely adopted1,2. However, despite successes, the molecular basis for a considerable fraction of trait-associated and disease-associated variants remains unclear3,4. Here we show that ADAR-mediated adenosine-to-inosine RNA editing, a post-transcriptional event vital for suppressing cellular double-stranded RNA (dsRNA)-mediated innate immune interferon responses5-11, is an important potential mechanism underlying genetic variants associated with common inflammatory diseases. We identified and characterized 30,319 cis-RNA editing QTLs (edQTLs) across 49 human tissues. These edQTLs were significantly enriched in genome-wide association study signals for autoimmune and immune-mediated diseases. Colocalization analysis of edQTLs with disease risk loci further pinpointed key, putatively immunogenic dsRNAs formed by expected inverted repeat Alu elements as well as unexpected, highly over-represented cis-natural antisense transcripts. Furthermore, inflammatory disease risk variants, in aggregate, were associated with reduced editing of nearby dsRNAs and induced interferon responses in inflammatory diseases. This unique directional effect agrees with the established mechanism that lack of RNA editing by ADAR1 leads to the specific activation of the dsRNA sensor MDA5 and subsequent interferon responses and inflammation7-9. Our findings implicate cellular dsRNA editing and sensing as a previously underappreciated mechanism of common inflammatory diseases.

View details for DOI 10.1038/s41586-022-05052-x

View details for PubMedID 35922514
Temporal dynamics of the multi-omic response to endurance exercise training across tissues Gay, N. R., Beltran, P., Amar, D., Montgomery, S. B., Carr, S. A., Motrpac Study Grp ELSEVIER. 2022: S31

View details for DOI 10.1016/j.mcpro.2022.100313

View details for Web of Science ID 000898188800027
Integration of rare expression outlier-associated variants improves polygenic risk prediction. American journal of human genetics Smail, C., Ferraro, N. M., Hui, Q., Durrant, M. G., Aguirre, M., Tanigawa, Y., Keever-Keigher, M. R., Rao, A. S., Justesen, J. M., Li, X., Gloudemans, M. J., Assimes, T. L., Kooperberg, C., Reiner, A. P., Huang, J., O'Donnell, C. J., Sun, Y. V., Million Veteran Program, Rivas, M. A., Montgomery, S. B. 2022

Abstract

Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p= 3*10-14), 62.3% increase in risk for severe obesity (p= 1*10-6), and median 5.29 years earlier onset for bariatric surgery (p=0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p= 2*10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.

View details for DOI 10.1016/j.ajhg.2022.04.015

View details for PubMedID 35588732
Multiple causal variants underlie genetic associations in humans. Science (New York, N.Y.) Abell, N. S., DeGorter, M. K., Gloudemans, M. J., Greenwald, E., Smith, K. S., He, Z., Montgomery, S. B. 2022; 375 (6586): 1247-1254

Abstract

Associations between genetic variation and traits are often in noncoding regions with strong linkage disequilibrium (LD), where a single causal variant is assumed to underlie the association. We applied a massively parallel reporter assay (MPRA) to functionally evaluate genetic variants in high, local LD for independent cis-expression quantitative trait loci (eQTL). We found that 17.7% of eQTLs exhibit more than one major allelic effect in tight LD. The detected regulatory variants were highly and specifically enriched for activating chromatin structures and allelic transcription factor binding. Integration of MPRA profiles with eQTL/complex trait colocalizations across 114 human traits and diseases identified causal variant sets demonstrating how genetic association signals can manifest through multiple, tightly linked causal variants.

View details for DOI 10.1126/science.abj5117

View details for PubMedID 35298243
Integration of genetic colocalizations with physiological and pharmacological perturbations identifies cardiometabolic disease genes. Genome medicine Gloudemans, M. J., Balliu, B., Nachun, D., Schnurr, T. M., Durrant, M. G., Ingelsson, E., Wabitsch, M., Quertermous, T., Montgomery, S. B., Knowles, J. W., Carcamo-Orive, I. 2022; 14 (1): 31

Abstract

BACKGROUND: Identification of causal genes for polygenic human diseases has been extremely challenging, and our understanding of how physiological and pharmacological stimuli modulate genetic risk at disease-associated loci is limited. Specifically, insulin resistance (IR), a common feature of cardiometabolic disease, including type 2 diabetes, obesity, and dyslipidemia, lacks well-powered genome-wide association studies (GWAS), and therefore, few associated loci and causal genes have been identified.METHODS: Here, we perform and integrate linkage disequilibrium (LD)-adjusted colocalization analyses across nine cardiometabolic traits (fasting insulin, fasting glucose, insulin sensitivity, insulin sensitivity index, type 2 diabetes, triglycerides, high-density lipoprotein, body mass index, and waist-hip ratio) combined with expression and splicing quantitative trait loci (eQTLs and sQTLs) from five metabolically relevant human tissues (subcutaneous and visceral adipose, skeletal muscle, liver, and pancreas). To elucidate the upstream regulators and functional mechanisms for these genes, we integrate their transcriptional responses to 21 relevant physiological and pharmacological perturbations in human adipocytes, hepatocytes, and skeletal muscle cells and map their protein-protein interactions.RESULTS: We identify 470 colocalized loci and prioritize 207 loci with a single colocalized gene. Patterns of shared colocalizations across traits and tissues highlight different potential roles for colocalized genes in cardiometabolic disease and distinguish several genes involved in pancreatic beta-cell function from others with a more direct role in skeletal muscle, liver, and adipose tissues. At the loci with a single colocalized gene, 42 of these genes were regulated by insulin and 35 by glucose in perturbation experiments, including 17 regulated by both. Other metabolic perturbations regulated the expression of 30 more genes not regulated by glucose or insulin, pointing to other potential upstream regulators of candidate causal genes.CONCLUSIONS: Our use of transcriptional responses under metabolic perturbations to contextualize genetic associations from our custom colocalization approach provides a list of likely causal genes and their upstream regulators in the context of IR-associated cardiometabolic risk.

View details for DOI 10.1186/s13073-022-01036-8

View details for PubMedID 35292083
Integration of genetic colocalizations with physiological and pharmacological perturbations identifies cardiometabolic disease genes Gloudemans, M. J., Balliu, B., Nachun, D., Durrant, M. G., Ingelsson, E., Wabitsch, M., Quertermous, T., Montgomery, S. B., Knowles, J., Carcamo-Orive, I. W B SAUNDERS CO-ELSEVIER INC. 2022: S24-S25

View details for DOI 10.1016/j.metabol.2021.155025

View details for Web of Science ID 000778891500062
TOWARDS TRANSCRIPTOMICS AS A PRIMARY TOOL FOR RARE DISEASE INVESTIGATION. Cold Spring Harbor molecular case studies Montgomery, S. B., Bernstein, J. A., Wheeler, M. T. 2022

Abstract

In the past five years transcriptome or RNA-sequencing (RNA-seq) has steadily emerged as a complementary assay for rare disease diagnosis and discovery. In this perspective, we summarize several recent developments and challenges in use of RNA-seq for rare disease investigation. Using an accessible patient sample, such as blood, skin, or muscle, RNA-seq enables the assay of expressed RNA transcripts. Analysis of RNA-seq allows the identification of aberrant or outlier gene expression and alternative splicing as functional evidence to support rare disease study and diagnosis. Further, many types of variant effects can be profiled beyond coding variants, as the consequences of non-coding variants that impact gene expression and splicing can be directly observed. This is particularly apparent for structural variants which disproportionately underlie outlier gene expression and for splicing variants where RNA-seq can both measure aberrant canonical splicing and detect deep intronic effects. However, a major potential limitation of RNA-seq in rare disease investigation is the developmental and cell type-specificity of gene expression as a pathogenic variant's effect may be limited to a specific spatiotemporal context and access to a patient's tissue sample from the relevant tissue and timing of disease expression may not be possible. We speculate that as advances in computational methods and emerging experimental techniques overcome both developmental and cell type-specificity, there will be broadening use of RNA sequencing and multi-omics in rare disease diagnosis and delivery of precision health.

View details for DOI 10.1101/mcs.a006198

View details for PubMedID 35217565
Lymphoid blast transformation in an MPN with BCR-JAK2 treated with ruxolitinib: putative mechanisms of resistance. Blood advances Chen, J. A., Hou, Y., Roskin, K. M., Arber, D. A., Bangs, C. D., Baughn, L. B., Cherry, A. M., Ewalt, M. D., Fire, A. Z., Fresard, L., Kearney, H. M., Montgomery, S. B., Ohgami, R. S., Pearce, K. E., Pitel, B. A., Merker, J. D., Gotlib, J. 2021; 5 (17): 3492-3496

Abstract

The basis for acquired resistance to JAK inhibition in patients with JAK2-driven hematologic malignancies is not well understood. We report a patient with a myeloproliferative neoplasm (MPN) with a BCR activator of RhoGEF and GTPase (BCR)-JAK2 fusion with initial hematologic response to ruxolitinib who rapidly developed B-lymphoid blast transformation. We analyzed pre-ruxolitinib and blast transformation samples using genome sequencing, DNA mate-pair sequencing (MPseq), RNA sequencing (RNA-seq), and chromosomal microarray to characterize possible mechanisms of resistance. No resistance mutations in the BCR-JAK2 fusion gene or transcript were identified, and fusion transcript expression levels remained stable. However, at the time of blast transformation, MPseq detected a new IKZF1 copy-number loss, which is predicted to result in loss of normal IKZF1 protein translation. RNA-seq revealed significant upregulation of genes negatively regulated by IKZF1, including IL7R and CRLF2. Disease progression was also characterized by adaptation to an activated B-cell receptor (BCR)-like signaling phenotype, with marked upregulation of genes such as CD79A, CD79B, IGLL1, VPREB1, BLNK, ZAP70, RAG1, and RAG2. In summary, IKZF1 deletion and a switch from cytokine dependence to activated BCR-like signaling phenotype represent putative mechanisms of ruxolitinib resistance in this case, recapitulating preclinical data on resistance to JAK inhibition in CRLF2-rearranged Philadelphia chromosome-like acute lymphoblastic leukemia.

View details for DOI 10.1182/bloodadvances.2020004174

View details for PubMedID 34505882
Genome-wide functional screen of 3'UTR variants uncovers causal variants for human disease and evolution. Cell Griesemer, D., Xue, J. R., Reilly, S. K., Ulirsch, J. C., Kukreja, K., Davis, J. R., Kanai, M., Yang, D. K., Butts, J. C., Guney, M. H., Luban, J., Montgomery, S. B., Finucane, H. K., Novina, C. D., Tewhey, R., Sabeti, P. C. 2021

Abstract

3' untranslated region (3'UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the massively parallel reporter assay for 3'UTRs (MPRAu) to sensitively assay 12,173 3'UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3'UTR function, suggesting that simple sequences predominately explain 3'UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base pair resolution, including an adenylate-uridylate (AU)-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3'UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14 and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.

View details for DOI 10.1016/j.cell.2021.08.025

View details for PubMedID 34534445
The role of Sp140 revealed in IgE and mast cell responses in Collaborative Cross mice. JCI insight Matsushita, K., Li, X., Nakamura, Y., Dong, D., Mukai, K., Tsai, M., Montgomery, S. B., Galli, S. J. 2021; 6 (12)

Abstract

Mouse IgE and mast cell (MC) functions have been studied primarily using inbred strains. Here, we (a) identified effects of genetic background on mouse IgE and MC phenotypes, (b) defined the suitability of various strains for studying IgE and MC functions, and (c) began to study potentially novel genes involved in such functions. We screened 47 Collaborative Cross (CC) strains, as well as C57BL/6J and BALB/cJ mice, for strength of passive cutaneous anaphylaxis (PCA) and responses to the intestinal parasite Strongyloides venezuelensis (S.v.). CC mice exhibited a diversity in PCA strength and S.v. responses. Among strains tested, C57BL/6J and CC027 mice showed, respectively, moderate and uniquely potent MC activity. Quantitative trait locus analysis and RNA sequencing of BM-derived cultured MCs (BMCMCs) from CC027 mice suggested Sp140 as a candidate gene for MC activation. siRNA-mediated knock-down of Sp140 in BMCMCs decreased IgE-dependent histamine release and cytokine production. Our results demonstrated marked variations in IgE and MC activity in vivo, and in responses to S.v., across CC strains. C57BL/6J and CC027 represent useful models for studying MC functions. Additionally, we identified Sp140 as a gene that contributes to IgE-dependent MC activation.

View details for DOI 10.1172/jci.insight.146572

View details for PubMedID 34156030
Identification of putative causal loci in whole-genome sequencing data via knockoff statistics. Nature communications He, Z., Liu, L., Wang, C., Le Guen, Y., Lee, J., Gogarten, S., Lu, F., Montgomery, S., Tang, H., Silverman, E. K., Cho, M. H., Greicius, M., Ionita-Laza, I. 2021; 12 (1): 3152

Abstract

The analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability, and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.

View details for DOI 10.1038/s41467-021-22889-4

View details for PubMedID 34035245
Compound heterozygous KCTD7 variants in progressive myoclonus epilepsy. Journal of neurogenetics Burke, E. A., Sturgeon, M., Zastrow, D. B., Fernandez, L., Prybol, C., Marwaha, S., Frothingham, E. P., Ward, P. A., Eng, C. M., Fresard, L., Montgomery, S. B., Enns, G. M., Fisher, P. G., Wolfe, L. A., Harding, B., Carrington, B., Bishop, K., Sood, R., Huang, Y., Elkahloun, A., Toro, C., Bassuk, A. G., Wheeler, M. T., Markello, T. C., Gahl, W. A., Malicdan, M. C. 2021: 1–10

Abstract

KCTD7 is a member of the potassium channel tetramerization domain-containing protein family and has been associated with progressive myoclonic epilepsy (PME), characterized by myoclonus, epilepsy, and neurological deterioration. Here we report four affected individuals from two unrelated families in which we identified KCTD7 compound heterozygous single nucleotide variants through exome sequencing. RNAseq was used to detect a non-annotated splicing junction created by a synonymous variant in the second family. Whole-cell patch-clamp analysis of neuroblastoma cells overexpressing the patients' variant alleles demonstrated aberrant potassium regulation. While all four patients experienced many of the common clinical features of PME, they also showed variable phenotypes not previously reported, including dysautonomia, brain pathology findings including a significantly reduced thalamus, and the lack of myoclonic seizures. To gain further insight into the pathogenesis of the disorder, zinc finger nucleases were used to generate kctd7 knockout zebrafish. Kctd7 homozygous mutants showed global dysregulation of gene expression and increased transcription of c-fos, which has previously been correlated with seizure activity in animal models. Together these findings expand the known phenotypic spectrum of KCTD7-associated PME, report a new animal model for future studies, and contribute valuable insights into the disease.

View details for DOI 10.1080/01677063.2021.1892095

View details for PubMedID 33970744
Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease. Cell de Goede, O. M., Nachun, D. C., Ferraro, N. M., Gloudemans, M. J., Rao, A. S., Smail, C., Eulalio, T. Y., Aguet, F., Ng, B., Xu, J., Barbeira, A. N., Castel, S. E., Kim-Hellmuth, S., Park, Y., Scott, A. J., Strober, B. J., GTEx Consortium, Brown, C. D., Wen, X., Hall, I. M., Battle, A., Lappalainen, T., Im, H. K., Ardlie, K. G., Mostafavi, S., Quertermous, T., Kirkegaard, K., Montgomery, S. B., Anand, S., Gabriel, S., Getz, G. A., Graubert, A., Hadley, K., Handsaker, R. E., Huang, K. H., Li, X., MacArthur, D. G., Meier, S. R., Nedzel, J. L., Nguyen, D. T., Segre, A. V., Todres, E., Balliu, B., Bonazzola, R., Brown, A., Conrad, D. F., Cotter, D. J., Cox, N., Das, S., Dermitzakis, E. T., Einson, J., Engelhardt, B. E., Eskin, E., Flynn, E. D., Fresard, L., Gamazon, E. R., Garrido-Martin, D., Gay, N. R., Guigo, R., Hamel, A. R., He, Y., Hoffman, P. J., Hormozdiari, F., Hou, L., Jo, B., Kasela, S., Kashin, S., Kellis, M., Kwong, A., Li, X., Liang, Y., Mangul, S., Mohammadi, P., Munoz-Aguirre, M., Nobel, A. B., Oliva, M., Park, Y., Parsana, P., Reverter, F., Rouhana, J. M., Sabatti, C., Saha, A., Stephens, M., Stranger, B. E., Teran, N. A., Vinuela, A., Wang, G., Wright, F., Wucher, V., Zou, Y., Ferreira, P. G., Li, G., Mele, M., Yeger-Lotem, E., Bradbury, D., Krubit, T., McLean, J. A., Qi, L., Robinson, K., Roche, N. V., Smith, A. M., Tabor, D. E., Undale, A., Bridge, J., Brigham, L. E., Foster, B. A., Gillard, B. M., Hasz, R., Hunter, M., Johns, C., Johnson, M., Karasik, E., Kopen, G., Leinweber, W. F., McDonald, A., Moser, M. T., Myer, K., Ramsey, K. D., Roe, B., Shad, S., Thomas, J. A., Walters, G., Washington, M., Wheeler, J., Jewell, S. D., Rohrer, D. C., Valley, D. R., Davis, D. A., Mash, D. C., Barcus, M. E., Branton, P. A., Sobin, L., Barker, L. K., Gardiner, H. M., Mosavel, M., Siminoff, L. A., Flicek, P., Haeussler, M., Juettemann, T., Kent, W. J., Lee, C. M., Powell, C. C., Rosenbloom, K. R., Ruffier, M., Sheppard, D., Taylor, K., Trevanion, S. J., Zerbino, D. R., Abell, N. S., Akey, J., Chen, L., Demanelis, K., Doherty, J. A., Feinberg, A. P., Hansen, K. D., Hickey, P. F., Jasmine, F., Jiang, L., Kaul, R., Kibriya, M. G., Li, J. B., Li, Q., Lin, S., Linder, S. E., Pierce, B. L., Rizzardi, L. F., Skol, A. D., Smith, K. S., Snyder, M., Stamatoyannopoulos, J., Tang, H., Wang, M., Carithers, L. J., Guan, P., Koester, S. E., Little, A. R., Moore, H. M., Nierras, C. R., Rao, A. K., Vaught, J. B., Volpi, S. 2021

Abstract

Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.

View details for DOI 10.1016/j.cell.2021.03.050

View details for PubMedID 33864768
Functional and structural analysis of cytokine selective IL6ST defects that cause recessive hyper-IgE syndrome. The Journal of allergy and clinical immunology Chen, Y., Zastrow, D. B., Metcalfe, R. D., Gartner, L., Krause, F., Morton, C. J., Marwaha, S., Fresard, L., Huang, Y., Zhao, C., McCormack, C., Bick, D., Worthey, E. A., Eng, C. M., Gold, J., Undiagnosed Diseases Network, Montgomery, S. B., Fisher, P. G., Ashley, E. A., Wheeler, M. T., Parker, M. W., Shanmugasundaram, V., Putoczki, T. L., Schmidt-Arras, D., Laurence, A., Bernstein, J. A., Griffin, M. D., Uhlig, H. H. 2021

Abstract

BACKGROUND: Biallelic variants in IL6ST cause a recessive form of hyper-IgE syndrome (HIES) characterized by high IgE, eosinophilia, defective acute phase response, susceptibility to bacterial infections and skeletal abnormalities due to cytokine selective loss-of-function in GP130 with defective IL-6 and IL-11, variable OSM and IL-27 but sparing LIF signaling.OBJECTIVE: To understand the functional and structural impact of recessive HIES-associated IL6ST variants.METHODS: We investigated a patient with HIES using exome, genome and RNA sequencing. Functional assays assessed IL-6, IL-11, IL-27, OSM, LIF, CT-1, CLC, and CNTF signaling. Molecular dynamic simulations and structural modeling of GP130 cytokine receptor complexes were performed.RESULTS: We identify a patient with compound heterozygous novel missense variants in IL6ST (p.Ala517Pro, and exon-skipping null variant p.Gly484_Pro518delinsArg). The p.Ala517Pro variant results in a more profound IL-6 and IL-11 dominated signaling defect compared to the previously identified recessive IL6ST variants p.Asn404Tyr, and p.Pro498Leu. Molecular dynamics simulations suggest that the p.Ala517Pro and p.Asn404Tyr variants result in increased flexibility of the extracellular membrane-proximal domains of GP130. We propose a structural model that explains the cytokine selectivity of pathogenic IL6ST variants that result in recessive HIES. The variants destabilize the hexameric cytokine receptor complexes whereas the trimeric LIF-GP130-LIFR complex remains stable by an additional membrane-proximal interaction. Deletion of this membrane-proximal interaction site in GP130 consequently causes additional defective LIF signaling and Stuve-Wiedemann syndrome.CONCLUSION: Our data provide a structural basis to understand clinical phenotypes in patients with IL6ST variants.

View details for DOI 10.1016/j.jaci.2021.02.044

View details for PubMedID 33771552
Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics. Nature genetics Bonder, M. J., Smail, C., Gloudemans, M. J., Fresard, L., Jakubosky, D., D'Antonio, M., Li, X., Ferraro, N. M., Carcamo-Orive, I., Mirauta, B., Seaton, D. D., Cai, N., Vakili, D., Horta, D., Zhao, C., Zastrow, D. B., Bonner, D. E., HipSci Consortium, iPSCORE consortium, Undiagnosed Diseases Network, PhLiPS consortium, Wheeler, M. T., Kilpinen, H., Knowles, J. W., Smith, E. N., Frazer, K. A., Montgomery, S. B., Stegle, O., Jan Bonder, M., Seaton, D., Jakubosky, D. A., Brown, C. D., Park, Y. 2021

Abstract

Induced pluripotent stem cells (iPSCs) are an established cellular system to study the impact of genetic variants in derived cell types and developmental contexts. However, in their pluripotent state, the disease impact of genetic variants is less well known. Here, we integrate data from 1,367 human iPSC lines to comprehensively map common and rare regulatory variants in human pluripotent cells. Using this population-scale resource, we report hundreds of new colocalization events for human traits specific to iPSCs, and find increased power to identify rare regulatory variants compared with somatic tissues. Finally, we demonstrate how iPSCs enable the identification of causal genes for rare diseases.

View details for DOI 10.1038/s41588-021-00800-7

View details for PubMedID 33664507
Evaluating the Genomic Parameters Governing rAAV-Mediated Homologous Recombination MOLECULAR THERAPY Spector, L. P., Tiffany, M., Ferraro, N. M., Abell, N. S., Montgomery, S. B., Kay, M. A. 2021; 29 (3): 1028–46

View details for DOI 10.1016/j.ymthe.2020.11.O25

View details for Web of Science ID 000632042500016
Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome biology Barbeira, A. N., Bonazzola, R., Gamazon, E. R., Liang, Y., Park, Y., Kim-Hellmuth, S., Wang, G., Jiang, Z., Zhou, D., Hormozdiari, F., Liu, B., Rao, A., Hamel, A. R., Pividori, M. D., Aguet, F., GTEx GWAS Working Group, Bastarache, L., Jordan, D. M., Verbanck, M., Do, R., GTEx Consortium, Stephens, M., Ardlie, K., McCarthy, M., Montgomery, S. B., Segre, A. V., Brown, C. D., Lappalainen, T., Wen, X., Im, H. K. 2021; 22 (1): 49

Abstract

The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.

View details for DOI 10.1186/s13059-020-02252-4

View details for PubMedID 33499903
Nonsense-mediated decay is highly stable across individuals and tissues. American journal of human genetics Teran, N. A., Nachun, D. C., Eulalio, T., Ferraro, N. M., Smail, C., Rivas, M. A., Montgomery, S. B. 2021

Abstract

Precise interpretation of the effects of rare protein-truncating variants (PTVs) is important for accurate determination of variant impact. Current methods for assessing the ability of PTVs to induce nonsense-mediated decay (NMD) focus primarily on the position of the variant in the transcript. We used RNA sequencing of the Genotype Tissue Expression v.8 cohort to compute the efficiency of NMD using allelic imbalance for 2,320 rare (genome aggregation database minor allele frequency ≤ 1%) PTVs across 809 individuals in 49 tissues. We created an interpretable predictive model using penalized logistic regression in order to evaluate the comprehensive influence of variant annotation, tissue, and inter-individual variation on NMD. We found that variant position, allele frequency, the inclusion of ultra-rare and singleton variants, and conservation were predictive of allelic imbalance. Furthermore, we found that NMD effects were highly concordant across tissues and individuals. Due to this high consistency, we demonstrate in silico that utilizing peripheral tissues or cell lines provides accurate prediction of NMD for PTVs.

View details for DOI 10.1016/j.ajhg.2021.06.008

View details for PubMedID 34216550
An integrated approach to identify environmental modulators of genetic risk factors for complex traits. American journal of human genetics Balliu, B., Carcamo-Orive, I., Gloudemans, M. J., Nachun, D. C., Durrant, M. G., Gazal, S., Park, C. Y., Knowles, D. A., Wabitsch, M., Quertermous, T., Knowles, J. W., Montgomery, S. B. 2021

Abstract

Complex traits and diseases can be influenced by both genetics and environment. However, given the large number of environmental stimuli and power challenges for gene-by-environment testing, it remains a critical challenge to identify and prioritize specific disease-relevant environmental exposures. We propose a framework for leveraging signals from transcriptional responses to environmental perturbations to identify disease-relevant perturbations that can modulate genetic risk for complex traits and inform the functions of genetic variants associated with complex traits. We perturbed human skeletal-muscle-, fat-, and liver-relevant cell lines with 21 perturbations affecting insulin resistance, glucose homeostasis, and metabolic regulation in humans and identified thousands of environmentally responsive genes. By combining these data with GWASs from 31 distinct polygenic traits, we show that the heritability of multiple traits is enriched in regions surrounding genes responsive to specific perturbations and, further, that environmentally responsive genes are enriched for associations with specific diseases and phenotypes from the GWAS Catalog. Overall, we demonstrate the advantages of large-scale characterization of transcriptional changes in diversely stimulated and pathologically relevant cells to identify disease-relevant perturbations.

View details for DOI 10.1016/j.ajhg.2021.08.014

View details for PubMedID 34582792
Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer's and Parkinson's diseases. Nature genetics Corces, M. R., Shcherbina, A., Kundu, S., Gloudemans, M. J., Fresard, L., Granja, J. M., Louie, B. H., Eulalio, T., Shams, S., Bagdatli, S. T., Mumbach, M. R., Liu, B., Montine, K. S., Greenleaf, W. J., Kundaje, A., Montgomery, S. B., Chang, H. Y., Montine, T. J. 2020

Abstract

Genome-wide association studies of neurological diseases have identified thousands of variants associated with disease phenotypes. However, most of these variants do not alter coding sequences, making it difficult to assign their function. Here, we present a multi-omic epigenetic atlas of the adult human brain through profiling of single-cell chromatin accessibility landscapes and three-dimensional chromatin interactions of diverse adult brain regions across a cohort of cognitively healthy individuals. We developed a machine-learning classifier to integrate this multi-omic framework and predict dozens of functional SNPs for Alzheimer's and Parkinson's diseases, nominating target genes and cell types for previously orphaned loci from genome-wide association studies. Moreover, we dissected the complex inverted haplotype of the MAPT (encoding tau) Parkinson's disease risk locus, identifying putative ectopic regulatory interactions in neurons that may mediate this disease association. This work expands understanding of inherited variation and provides a roadmap for the epigenomic dissection of causal regulatory variation in disease.

View details for DOI 10.1038/s41588-020-00721-x

View details for PubMedID 33106633
The GTEx Consortium atlas of genetic regulatory effects across human tissues SCIENCE Aguet, F., Barbeira, A. N., Bonazzola, R., Brown, A., Castel, S. E., Jo, B., Kasela, S., Kim-Hellmuth, S., Liang, Y., Parsana, P., Flynn, E., Fresard, L., Gamazon, E. R., Hamel, A. R., He, Y., Hormozdiari, F., Mohammadi, P., Munoz-Aguirre, M., Ardlie, K. G., Battle, A., Bonazzola, R., Brown, C. D., Cox, N., Dermitzakis, E. T., Engelhardt, B. E., Garrido-Martin, D., Gay, N. R., Getz, G., Guigo, R., Hamel, A. R., Handsaker, R. E., He, Y., Hoffman, P. J., Hormozdiari, F., Im, H., Jo, B., Kasela, S., Kashin, S., Kim-Hellmuth, S., Kwong, A., Lappalainen, T., Li, X., Liang, Y., MacArthur, D. G., Mohammadi, P., Montgomery, S. B., Munoz-Aguirre, M., Rouhana, J. M., Hormozdiari, F., Im, H., Kim-Hellmuth, S., Ardlie, K. G., Getz, G., Guigo, R., Im, H., Lappalainen, T., Montgomery, S. B., Im, H., Lappalainen, T., Lappalainen, T., Anand, S., Gabriel, S., Getz, G., Graubert, A., Hadley, K., Handsaker, R. E., Huang, K. H., Kashin, S., Li, X., MacArthur, D. G., Meier, S. R., Nedzel, J. L., Balliu, B., Conrad, D., Cotter, D. J., Das, S., de Goede, O. M., Eskin, E., Eulalio, T. Y., Ferraro, N. M., Garrido-Martin, D., Gay, N. R., Getz, G., Graubert, A., Guigo, R., Hadley, K., Hamel, A. R., Handsaker, R. E., He, Y., Hoffman, P. J., Hormozdiari, F., Hou, L., Huang, K. H., Im, H., Jo, B., Kasela, S., Kashin, S., Kellis, M., Kim-Hellmuth, S., Kwong, A., Lappalainen, T., Li, X., Li, X., Liang, Y., MacArthur, D. G., Mangul, S., Meier, S. R., Mohammadi, P., Montgomery, S. B., Munoz-Aguirre, M., Nachun, D. C., Nedzel, J. L., Nguyen, D. Y., Nobel, A. B., Park, Y., Reverter, F., Sabatti, C., Saha, A., Segre, A., Stephens, M., Strober, B. J., Teran, N. A., Todres, E., Vinuela, A., Wang, G., Wen, X., Wright, F., Wucher, V., Zou, Y., Ferreira, P. G., Li, G., Mele, M., Yeger-Lotem, E., Barcus, M. E., Bradbury, D., Krubit, T., McLean, J. A., Qi, L., Robinson, K., Roche, N., Smith, A. M., Tabor, D. E., Undale, A., Bridge, J., Brigham, L. E., Foster, B. A., Gillard, B. M., Hasz, R., Hunter, M., Johns, C., Johnson, M., Karasik, E., Kopen, G., Leinweber, W. F., McDonald, A., Moser, M. T., Myer, K., Ramsey, K. D., Roe, B., Shad, S., Thomas, J. A., Walters, G., Washington, M., Wheeler, J., Jewell, S. D., Rohrer, D. C., Valley, D. R., Davis, D. A., Mash, D. C., Branton, P. A., Sobin, L., Barker, L. K., Gardiner, H. M., Mosavel, M., Siminoff, L. A., Flicek, P., Haeussler, M., Juettemann, T., Kent, W., Lee, C. M., Powell, C. C., Rosenbloom, K. R., Ruffier, M., Sheppard, D., Taylor, K., Trevanion, S. J., Zerbino, D. R., Abell, N. S., Akey, J., Chen, L., Demanelis, K., Doherty, J. A., Feinberg, A. P., Hansen, K. D., Hickey, P. F., Hou, L., Jasmine, F., Jiang, L., Kaul, R., Kellis, M., Kibriya, M. G., Li, J., Li, Q., Lin, S., Linder, S. E., Montgomery, S. B., Oliva, M., Park, Y., Pierce, B. L., Rizzardi, L. F., Skol, A. D., Smith, K. S., Snyder, M., Stamatoyannopoulos, J., Tang, H., Wang, M., Carithers, L. J., Guan, P., Koester, S. E., Little, A., Moore, H. M., Nierras, C. R., Rao, A. K., Vaught, J. B., Volpi, S., GTEx Consortium 2020; 369 (6509): 1318-+

View details for DOI 10.1126/science.aaz1776

View details for Web of Science ID 000569840300041
Molecular Transducers of Physical Activity Consortium (MoTrPAC): Mapping the Dynamic Responses to Exercise. Cell Sanford, J. A., Nogiec, C. D., Lindholm, M. E., Adkins, J. N., Amar, D., Dasari, S., Drugan, J. K., Fernandez, F. M., Radom-Aizik, S., Schenk, S., Snyder, M. P., Tracy, R. P., Vanderboom, P., Trappe, S., Walsh, M. J., Molecular Transducers of Physical Activity Consortium, Adkins, J. N., Amar, D., Dasari, S., Drugan, J. K., Evans, C. R., Fernandez, F. M., Li, Y., Lindholm, M. E., Nogiec, C. D., Radom-Aizik, S., Sanford, J. A., Schenk, S., Snyder, M. P., Tomlinson, L., Tracy, R. P., Trappe, S., Vanderboom, P., Walsh, M. J., Alekel, D. L., Bekirov, I., Boyce, A. T., Boyington, J., Fleg, J. L., Joseph, L. J., Laughlin, M. R., Maruvada, P., Morris, S. A., McGowan, J. A., Nierras, C., Pai, V., Peterson, C., Ramos, E., Roary, M. C., Williams, J. P., Xia, A., Cornell, E., Rooney, J., Miller, M. E., Ambrosius, W. T., Rushing, S., Stowe, C. L., Rejeski, W. J., Nicklas, B. J., Pahor, M., Lu, C., Trappe, T., Chambers, T., Raue, U., Lester, B., Bergman, B. C., Bessesen, D. H., Jankowski, C. M., Kohrt, W. M., Melanson, E. L., Moreau, K. L., Schauer, I. E., Schwartz, R. S., Kraus, W. E., Slentz, C. A., Huffman, K. M., Johnson, J. L., Willis, L. H., Kelly, L., Houmard, J. A., Dubis, G., Broskey, N., Goodpaster, B. H., Sparks, L. M., Coen, P. M., Cooper, D. M., Haddad, F., Rankinen, T., Ravussin, E., Johannsen, N., Harris, M., Jakicic, J. M., Newman, A. B., Forman, D. D., Kershaw, E., Rogers, R. J., Nindl, B. C., Page, L. C., Stefanovic-Racic, M., Barr, S. L., Rasmussen, B. B., Moro, T., Paddon-Jones, D., Volpi, E., Spratt, H., Musi, N., Espinoza, S., Patel, D., Serra, M., Gelfond, J., Burns, A., Bamman, M. M., Buford, T. W., Cutter, G. R., Bodine, S. C., Esser, K., Farrar, R. P., Goodyear, L. J., Hirshman, M. F., Albertson, B. G., Qian, W., Piehowski, P., Gritsenko, M. A., Monore, M. E., Petyuk, V. A., McDermott, J. E., Hansen, J. N., Hutchison, C., Moore, S., Gaul, D. A., Clish, C. B., Avila-Pacheco, J., Dennis, C., Kellis, M., Carr, S., Jean-Beltran, P. M., Keshishian, H., Mani, D. R., Clauser, K., Krug, K., Mundorff, C., Pearce, C., Ivanova, A. A., Ortlund, E. A., Maner-Smith, K., Uppal, K., Zhang, T., Sealfon, S. C., Zavlasky, E., Nair, V., Li, S., Jain, N., Ge, Y., Sun, Y., Nudelman, G., Ruf-Zamojski, F., Smith, G., Pincas, N., Rubenstein, A., Amper, M. A., Seenarine, N., Lappalainen, T., Lanza, I. R., Nair, K. S., Klaus, K., Montgomery, S. B., Smith, K. S., Gay, N. R., Zhao, B., Hung, C. J., Zebarjadi, N., Balliu, B., Fresard, L., Burant, C. F., Li, J. Z., Kachman, M., Soni, T., Raskind, A. B., Gerszten, R., Robbins, J., Ilkayeva, O., Muehlbauer, M. J., Newgard, C. B., Ashley, E. A., Wheeler, M. T., Jimenez-Morales, D., Raja, A., Dalton, K. P., Zhen, J., Kim, Y. S., Christle, J. W., Marwaha, S., Chin, E. T., Hershman, S. G., Hastie, T., Tibshirani, R., Rivas, M. A. 2020; 181 (7): 1464–74

Abstract

Exercise provides a robust physiological stimulus that evokes cross-talk among multiple tissues that when repeated regularly (i.e., training) improves physiological capacity, benefits numerous organ systems, and decreases the risk for premature mortality. However, a gap remains in identifying the detailed molecular signals induced by exercise that benefits health and prevents disease. The Molecular Transducers of Physical Activity Consortium (MoTrPAC) was established to address this gap and generate a molecular map of exercise. Preclinical and clinical studies will examine the systemic effects of endurance and resistance exercise across a range of ages and fitness levels by molecular probing of multiple tissues before and after acute and chronic exercise. From this multi-omic and bioinformatic analysis, a molecular map of exercise will be established. Altogether, MoTrPAC will provide a public database that is expected to enhance our understanding of the health benefits of exercise and to provide insight into how physical activity mitigates disease.

View details for DOI 10.1016/j.cell.2020.06.004

View details for PubMedID 32589957
Discovery and quality analysis of a comprehensive set of structural variants and short tandem repeats. Nature communications Jakubosky, D., Smith, E. N., D'Antonio, M., Jan Bonder, M., Young Greenwald, W. W., D'Antonio-Chronowska, A., Matsui, H., i2QTL Consortium, Stegle, O., Montgomery, S. B., DeBoever, C., Frazer, K. A., Bonder, M. J., Cai, N., Carcamo-Orive, I., D'Antonio, M., Frazer, K. A., Young Greenwald, W. W., Jakubosky, D., Knowles, J. W., Matsui, H., McCarthy, D. J., Mirauta, B. A., Montgomery, S. B., Quertermous, T., Seaton, D. D., Smail, C., Smith, E. N., Stegle, O. 2020; 11 (1): 2928

Abstract

Structural variants (SVs) and short tandem repeats (STRs) are important sources of genetic diversity but are not routinely analyzed in genetic studies because they are difficult to accurately identify and genotype. Because SVs and STRs range in size and type, it is necessary to apply multiple algorithms that incorporate different types of evidence from sequencing data and employ complex filtering strategies to discover a comprehensive set of high-quality and reproducible variants. Here we assemble a set of 719 deep whole genome sequencing (WGS) samples (mean 42*) from 477 distinct individuals which we use to discover and genotype a wide spectrum of SV and STR variants using five algorithms. We use 177 unique pairs of genetic replicates to identify factors that affect variant call reproducibility and develop a systematic filtering strategy to create of one of the most complete and well characterized maps of SVs and STRs to date.

View details for DOI 10.1038/s41467-020-16481-5

View details for PubMedID 32522985
Properties of structural variants and short tandem repeats associated with gene expression and complex traits. Nature communications Jakubosky, D., D'Antonio, M., Bonder, M. J., Smail, C., Donovan, M. K., Young Greenwald, W. W., Matsui, H., i2QTL Consortium, D'Antonio-Chronowska, A., Stegle, O., Smith, E. N., Montgomery, S. B., DeBoever, C., Frazer, K. A., Bonder, M. J., Cai, N., Carcamo-Orive, I., D'Antonio, M., Frazer, K. A., Young Greenwald, W. W., Jakubosky, D., Knowles, J. W., Matsui, H., McCarthy, D. J., Mirauta, B. A., Montgomery, S. B., Quertermous, T., Seaton, D. D., Smail, C., Smith, E. N., Stegle, O. 2020; 11 (1): 2927

Abstract

Structural variants (SVs) and short tandem repeats (STRs) comprise a broad group of diverse DNA variants which vastly differ in their sizes and distributions across the genome. Here, we identify genomic features of SV classes and STRs that are associated with gene expression and complex traits, including their locations relative to eGenes, likelihood of being associated with multiple eGenes, associated eGene types (e.g., coding, noncoding, level of evolutionary constraint), effect sizes, linkage disequilibrium with tagging single nucleotide variants used in GWAS, and likelihood of being associated with GWAS traits. We identify a set of high-impact SVs/STRs associated with the expression of three or more eGenes via chromatin loops and show that they are highly enriched for being associated with GWAS traits. Our study provides insights into the genomic properties of structural variant classes and short tandem repeats that are associated with gene expression and human traits.

View details for DOI 10.1038/s41467-020-16482-4

View details for PubMedID 32522982
Transcriptional and Position Effect Contributions to rAAV-Mediated Gene Targeting Spector, L. P., Tiffany, M., Ferraro, N. M., Abell, N. S., Montgomery, S. B., Kay, M. A. CELL PRESS. 2020: 290

View details for Web of Science ID 000530089301198
Molecular Choreography of Acute Exercise. Cell Contrepois, K. n., Wu, S. n., Moneghetti, K. J., Hornburg, D. n., Ahadi, S. n., Tsai, M. S., Metwally, A. A., Wei, E. n., Lee-McMullen, B. n., Quijada, J. V., Chen, S. n., Christle, J. W., Ellenberger, M. n., Balliu, B. n., Taylor, S. n., Durrant, M. G., Knowles, D. A., Choudhry, H. n., Ashland, M. n., Bahmani, A. n., Enslen, B. n., Amsallem, M. n., Kobayashi, Y. n., Avina, M. n., Perelman, D. n., Schüssler-Fiorenza Rose, S. M., Zhou, W. n., Ashley, E. A., Montgomery, S. B., Chaib, H. n., Haddad, F. n., Snyder, M. P. 2020; 181 (5): 1112–30.e16

Abstract

Acute physical activity leads to several changes in metabolic, cardiovascular, and immune pathways. Although studies have examined selected changes in these pathways, the system-wide molecular response to an acute bout of exercise has not been fully characterized. We performed longitudinal multi-omic profiling of plasma and peripheral blood mononuclear cells including metabolome, lipidome, immunome, proteome, and transcriptome from 36 well-characterized volunteers, before and after a controlled bout of symptom-limited exercise. Time-series analysis revealed thousands of molecular changes and an orchestrated choreography of biological processes involving energy metabolism, oxidative stress, inflammation, tissue repair, and growth factor response, as well as regulatory pathways. Most of these processes were dampened and some were reversed in insulin-resistant participants. Finally, we discovered biological pathways involved in cardiopulmonary exercise response and developed prediction models revealing potential resting blood-based biomarkers of peak oxygen consumption.

View details for DOI 10.1016/j.cell.2020.04.043

View details for PubMedID 32470399
Evaluating the genomic parameters governing rAAV-mediated homologous recombination. Molecular therapy : the journal of the American Society of Gene Therapy Spector, L. P., Tiffany, M. n., Ferraro, N. M., Abell, N. S., Montgomery, S. B., Kay, M. A. 2020

Abstract

Recombinant AAV vectors have the unique ability to promote targeted integration of transgenes via homologous recombination at specified genomic sites reaching frequencies of 0.1-1%. We studied genomic parameters that influence targeting efficiencies on a large scale. To do this, we generated more than 1000 engineered, doxycycline-inducible target sites in the human HAP1 cell line and infected this polyclonal population with a library of AAV-DJ targeting vectors each carrying a unique barcode. The heterogeneity of barcode integration at each target site provided an assessment of targeting efficiency at that locus. We compared targeting efficiency with and without target site transcription for identical chromosomal positions. Targeting efficiency was enhanced by target site transcription, while chromatin accessibility was associated with an increased likelihood of targeting. ChromHMM chromatin states characterizing transcription and enhancers in wildtype K562 cells were also associated with increased AAV-HR efficiency with and without target site transcription, respectively. Furthermore, the amenability of a site to targeting was influenced by the endogenous transcriptional level of intersecting genes. These results define important parameters that may not only assist in designing optimal targeting vectors for genome editing, but also provide new insights into the mechanism of AAV-mediated homologous recombination.

View details for DOI 10.1016/j.ymthe.2020.11.025

View details for PubMedID 33248247
The impact of sex on gene expression across human tissues. Science (New York, N.Y.) Oliva, M. n., Muñoz-Aguirre, M. n., Kim-Hellmuth, S. n., Wucher, V. n., Gewirtz, A. D., Cotter, D. J., Parsana, P. n., Kasela, S. n., Balliu, B. n., Viñuela, A. n., Castel, S. E., Mohammadi, P. n., Aguet, F. n., Zou, Y. n., Khramtsova, E. A., Skol, A. D., Garrido-Martín, D. n., Reverter, F. n., Brown, A. n., Evans, P. n., Gamazon, E. R., Payne, A. n., Bonazzola, R. n., Barbeira, A. N., Hamel, A. R., Martinez-Perez, A. n., Soria, J. M., Pierce, B. L., Stephens, M. n., Eskin, E. n., Dermitzakis, E. T., Segrè, A. V., Im, H. K., Engelhardt, B. E., Ardlie, K. G., Montgomery, S. B., Battle, A. J., Lappalainen, T. n., Guigó, R. n., Stranger, B. E. 2020; 369 (6509)

Abstract

Many complex human phenotypes exhibit sex-differentiated characteristics. However, the molecular mechanisms underlying these differences remain largely unknown. We generated a catalog of sex differences in gene expression and in the genetic regulation of gene expression across 44 human tissue sources surveyed by the Genotype-Tissue Expression project (GTEx, v8 release). We demonstrate that sex influences gene expression levels and cellular composition of tissue samples across the human body. A total of 37% of all genes exhibit sex-biased expression in at least one tissue. We identify cis expression quantitative trait loci (eQTLs) with sex-differentiated effects and characterize their cellular origin. By integrating sex-biased eQTLs with genome-wide association study data, we identify 58 gene-trait associations that are driven by genetic regulation of gene expression in a single sex. These findings provide an extensive characterization of sex differences in the human transcriptome and its genetic regulation.

View details for DOI 10.1126/science.aba3066

View details for PubMedID 32913072
Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome biology Gay, N. R., Gloudemans, M. n., Antonio, M. L., Abell, N. S., Balliu, B. n., Park, Y. n., Martin, A. R., Musharoff, S. n., Rao, A. S., Aguet, F. n., Barbeira, A. N., Bonazzola, R. n., Hormozdiari, F. n., Ardlie, K. G., Brown, C. D., Im, H. K., Lappalainen, T. n., Wen, X. n., Montgomery, S. B. 2020; 21 (1): 233

Abstract

Population structure among study subjects may confound genetic association studies, and lack of proper correction can lead to spurious findings. The Genotype-Tissue Expression (GTEx) project largely contains individuals of European ancestry, but the v8 release also includes up to 15% of individuals of non-European ancestry. Assessing ancestry-based adjustments in GTEx improves portability of this research across populations and further characterizes the impact of population structure on GWAS colocalization.Here, we identify a subset of 117 individuals in GTEx (v8) with a high degree of population admixture and estimate genome-wide local ancestry. We perform genome-wide cis-eQTL mapping using admixed samples in seven tissues, adjusted by either global or local ancestry. Consistent with previous work, we observe improved power with local ancestry adjustment. At loci where the two adjustments produce different lead variants, we observe 31 loci (0.02%) where a significant colocalization is called only with one eQTL ancestry adjustment method. Notably, both adjustments produce similar numbers of significant colocalizations within each of two different colocalization methods, COLOC and FINEMAP. Finally, we identify a small subset of eQTL-associated variants highly correlated with local ancestry, providing a resource to enhance functional follow-up.We provide a local ancestry map for admixed individuals in the GTEx v8 release and describe the impact of ancestry and admixture on gene expression, eQTLs, and GWAS colocalization. While the majority of the results are concordant between local and global ancestry-based adjustments, we identify distinct advantages and disadvantages to each approach.

View details for DOI 10.1186/s13059-020-02113-0

View details for PubMedID 32912333
Transcriptomic signatures across human tissues identify functional rare genetic variation. Science (New York, N.Y.) Ferraro, N. M., Strober, B. J., Einson, J. n., Abell, N. S., Aguet, F. n., Barbeira, A. N., Brandt, M. n., Bucan, M. n., Castel, S. E., Davis, J. R., Greenwald, E. n., Hess, G. T., Hilliard, A. T., Kember, R. L., Kotis, B. n., Park, Y. n., Peloso, G. n., Ramdas, S. n., Scott, A. J., Smail, C. n., Tsang, E. K., Zekavat, S. M., Ziosi, M. n., Aradhana, n. n., Ardlie, K. G., Assimes, T. L., Bassik, M. C., Brown, C. D., Correa, A. n., Hall, I. n., Im, H. K., Li, X. n., Natarajan, P. n., Lappalainen, T. n., Mohammadi, P. n., Montgomery, S. B., Battle, A. n. 2020; 369 (6509)

Abstract

Rare genetic variants are abundant across the human genome, and identifying their function and phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes of RVs. We developed Watershed, a probabilistic model that integrates multiple genomic and transcriptomic signals to predict variant function, validated these predictions in additional cohorts and through experimental assays, and used them to assess RVs in the UK Biobank, the Million Veterans Program, and the Jackson Heart Study. Our results link thousands of RVs to diverse molecular effects and provide evidence to associate RVs affecting the transcriptome with human traits.

View details for DOI 10.1126/science.aaz5900

View details for PubMedID 32913073
FAM13A affects body fat distribution and adipocyte function. Nature communications Fathzadeh, M. n., Li, J. n., Rao, A. n., Cook, N. n., Chennamsetty, I. n., Seldin, M. n., Zhou, X. n., Sangwung, P. n., Gloudemans, M. J., Keller, M. n., Attie, A. n., Yang, J. n., Wabitsch, M. n., Carcamo-Orive, I. n., Tada, Y. n., Lusis, A. J., Shin, M. K., Molony, C. M., McLaughlin, T. n., Reaven, G. n., Montgomery, S. B., Reilly, D. n., Quertermous, T. n., Ingelsson, E. n., Knowles, J. W. 2020; 11 (1): 1465

Abstract

Genetic variation in the FAM13A (Family with Sequence Similarity 13 Member A) locus has been associated with several glycemic and metabolic traits in genome-wide association studies (GWAS). Here, we demonstrate that in humans, FAM13A alleles are associated with increased FAM13A expression in subcutaneous adipose tissue (SAT) and an insulin resistance-related phenotype (e.g. higher waist-to-hip ratio and fasting insulin levels, but lower body fat). In human adipocyte models, knockdown of FAM13A in preadipocytes accelerates adipocyte differentiation. In mice, Fam13a knockout (KO) have a lower visceral to subcutaneous fat (VAT/SAT) ratio after high-fat diet challenge, in comparison to their wild-type counterparts. Subcutaneous adipocytes in KO mice show a size distribution shift toward an increased number of smaller adipocytes, along with an improved adipogenic potential. Our results indicate that GWAS-associated variants within the FAM13A locus alter adipose FAM13A expression, which in turn, regulates adipocyte differentiation and contribute to changes in body fat distribution.

View details for DOI 10.1038/s41467-020-15291-z

View details for PubMedID 32193374
A Bioinformatic Analysis of Integrative Mobile Genetic Elements Highlights Their Role in Bacterial Adaptation. Cell host & microbe Durrant, M. G., Li, M. M., Siranosian, B. A., Montgomery, S. B., Bhatt, A. S. 2019

Abstract

Mobile genetic elements (MGEs) contribute to bacterial adaptation and evolution; however, high-throughput, unbiased MGE detection remains challenging. We describe MGEfinder, a bioinformatic toolbox that identifies integrative MGEs and their insertion sites by using short-read sequencing data. MGEfinder identifies the genomic site of each MGE insertion and infers the identity of the inserted sequence. We apply MGEfinder to 12,374 sequenced isolates of 9 prevalent bacterial pathogens, includingMycobacterium tuberculosis, Staphylococcus aureus, and Escherichia coli, and identify thousands of MGEs, including candidate insertion sequences, conjugative transposons, and prophage elements. The MGE repertoire and insertion rates vary across species, and integration sites often cluster near genes related to antibiotic resistance, virulence, and pathogenicity. MGE insertions likely contribute to antibiotic resistance in laboratory experiments and clinical isolates. Additionally, we identified thousands of mobility genes, a subset of which have unknown function opening avenues for exploration. Future application of MGEfinder to commensal bacteria will further illuminate bacterial adaptation and evolution.

View details for DOI 10.1016/j.chom.2019.10.022

View details for PubMedID 31862382
Genetic regulation of gene expression and splicing during a 10-year period of human aging. Genome biology Balliu, B., Durrant, M., Goede, O. d., Abell, N., Li, X., Liu, B., Gloudemans, M. J., Cook, N. L., Smith, K. S., Knowles, D. A., Pala, M., Cucca, F., Schlessinger, D., Jaiswal, S., Sabatti, C., Lind, L., Ingelsson, E., Montgomery, S. B. 2019; 20 (1): 230

Abstract

BACKGROUND: Molecular and cellular changes are intrinsic to aging and age-related diseases. Prior cross-sectional studies have investigated the combined effects of age and genetics on gene expression and alternative splicing; however, there has been no long-term, longitudinal characterization of these molecular changes, especially in older age.RESULTS: We perform RNA sequencing in whole blood from the same individuals at ages 70 and 80 to quantify how gene expression, alternative splicing, and their genetic regulation are altered during this 10-year period of advanced aging at a population and individual level. We observe that individuals are more similar to their own expression profiles later in life than profiles of other individuals their own age. We identify 1291 and 294 genes differentially expressed and alternatively spliced with age, as well as 529 genes with outlying individual trajectories. Further, we observe a strong correlation of genetic effects on expression and splicing between the two ages, with a small subset of tested genes showing a reduction in genetic associations with expression and splicing in older age.CONCLUSIONS: These findings demonstrate that, although the transcriptome and its genetic regulation is mostly stable late in life, a small subset of genes is dynamic and is characterized by a reduction in genetic regulation, most likely due to increasing environmental variance with age.

View details for DOI 10.1186/s13059-019-1840-y

View details for PubMedID 31684996
COMPREHENSIVE RNA ANALYSIS OF CEREBROSPINAL FLUID FROM LEPTOMENINGEAL METASTASES Polyak, D., Li, Y., Liu, B., Connolly, I., Khoeur, L., Kakusa, B., Johnson, E., Andersen, S., Pan, W., Nagpal, S., Montgomery, S. B., Gephart, M. OXFORD UNIV PRESS INC. 2019: 62

View details for Web of Science ID 000509478701108
Uganda Genome Resource Enables Insights into Population History and Genomic Discovery in Africa. Cell Gurdasani, D., Carstensen, T., Fatumo, S., Chen, G., Franklin, C. S., Prado-Martinez, J., Bouman, H., Abascal, F., Haber, M., Tachmazidou, I., Mathieson, I., Ekoru, K., DeGorter, M. K., Nsubuga, R. N., Finan, C., Wheeler, E., Chen, L., Cooper, D. N., Schiffels, S., Chen, Y., Ritchie, G. R., Pollard, M. O., Fortune, M. D., Mentzer, A. J., Garrison, E., Bergstrom, A., Hatzikotoulas, K., Adeyemo, A., Doumatey, A., Elding, H., Wain, L. V., Ehret, G., Auer, P. L., Kooperberg, C. L., Reiner, A. P., Franceschini, N., Maher, D. P., Montgomery, S. B., Kadie, C., Widmer, C., Xue, Y., Seeley, J., Asiki, G., Kamali, A., Young, E. H., Pomilla, C., Soranzo, N., Zeggini, E., Pirie, F., Morris, A. P., Heckerman, D., Tyler-Smith, C., Motala, A., Rotimi, C., Kaleebu, P., Barroso, I., Sandhu, M. S. 2019; 179 (4): 984

Abstract

Genomic studies in African populations provide unique opportunities to understand disease etiology, human diversity, and population history. In the largest study of its kind, comprising genome-wide data from 6,400 individuals and whole-genome sequences from 1,978 individuals from rural Uganda, we find evidence of geographically correlated fine-scale population substructure. Historically, the ancestry of modern Ugandans was best represented by a mixture of ancient East African pastoralists. We demonstrate the value of the largest sequence panel from Africa to date as an imputation resource. Examining 34 cardiometabolic traits, we show systematic differences in trait heritability between European and African populations, probably reflecting the differential impact of genes and environment. In a multi-trait pan-African GWAS of up to 14,126 individuals, we identify novel loci associated with anthropometric, hematological, lipid, and glycemic traits. We find that several functionally important signals are driven by Africa-specific variants, highlighting the value of studying diverse populations across the region.

View details for DOI 10.1016/j.cell.2019.10.004

View details for PubMedID 31675503
Atheroprotective roles of smooth muscle cell phenotypic modulation and the TCF21 disease gene as revealed by single-cell analysis. Nature medicine Wirka, R. C., Wagh, D., Paik, D. T., Pjanic, M., Nguyen, T., Miller, C. L., Kundu, R., Nagao, M., Coller, J., Koyano, T. K., Fong, R., Woo, Y. J., Liu, B., Montgomery, S. B., Wu, J. C., Zhu, K., Chang, R., Alamprese, M., Tallquist, M. D., Kim, J. B., Quertermous, T. 2019

Abstract

In response to various stimuli, vascular smooth muscle cells (SMCs) can de-differentiate, proliferate and migrate in a process known as phenotypic modulation. However, the phenotype of modulated SMCs in vivo during atherosclerosis and the influence of this process on coronary artery disease (CAD) risk have not been clearly established. Using single-cell RNA sequencing, we comprehensively characterized the transcriptomic phenotype of modulated SMCs in vivo in atherosclerotic lesions of both mouse and human arteries and found that these cells transform into unique fibroblast-like cells, termed 'fibromyocytes', rather than into a classical macrophage phenotype. SMC-specific knockout of TCF21-a causal CAD gene-markedly inhibited SMC phenotypic modulation in mice, leading to the presence of fewer fibromyocytes within lesions as well as within the protective fibrous cap of the lesions. Moreover, TCF21 expression was strongly associated with SMC phenotypic modulation in diseased human coronary arteries, and higher levels of TCF21 expression were associated with decreased CAD risk in human CAD-relevant tissues. These results establish a protective role for both TCF21 and SMC phenotypic modulation in this disease.

View details for DOI 10.1038/s41591-019-0512-5

View details for PubMedID 31359001
Identifying causal variants and genes using functional genomics in specialized cell types and contexts. Human genetics Liu, B., Montgomery, S. B. 2019

Abstract

A central goal in human genetics is the identification of variants and genes that influence the risk of polygenic diseases. In the past decade, genome-wide association studies (GWAS) have identified tens of thousands of genetic loci associated with various diseases. Since the majority of such loci lie within non-coding regions and have many candidate variants in linkage disequilibrium, it has been challenging to accurately identify specific causal variants and genes. To aid in their discovery a variety of statistical and experimental approaches have been developed. These approaches often borrow information from functional genomics assays such as ATAC-seq, ChIP-seq and RNA-seq to annotate functional variants and identify regulatory relationships between variants and genes. While such approaches are powerful, given the diversity of cell types and environments, it is paramount to select disease-relevant contexts for follow-up analyses. In this review, we discuss the latest developments, challenges, and best practices for determining the causal mechanisms of polygenic disease risk variants with functional genomics data from specialized cell types.

View details for DOI 10.1007/s00439-019-02044-2

View details for PubMedID 31317254
Disease mechanisms elucidated by genetic regulation of human RPE gene expression Vollrath, D., Liu, B., Calton, M. A., Abell, N. S., Benchorin, G., Gloudemans, M. J., Chen, M., Hu, J., Li, X., Balliu, B., Bok, D., Montgomery, S. B. ASSOC RESEARCH VISION OPHTHALMOLOGY INC. 2019

View details for Web of Science ID 000488628104139
Genetic analyses of human fetal retinal pigment epithelium gene expression suggest ocular disease mechanisms. Communications biology Liu, B., Calton, M. A., Abell, N. S., Benchorin, G., Gloudemans, M. J., Chen, M., Hu, J., Li, X., Balliu, B., Bok, D., Montgomery, S. B., Vollrath, D. 2019; 2 (1): 186

Abstract

The retinal pigment epithelium (RPE) serves vital roles in ocular development and retinal homeostasis but has limited representation in large-scale functional genomics datasets. Understanding how common human genetic variants affect RPE gene expression could elucidate the sources of phenotypic variability in selected monogenic ocular diseases and pinpoint causal genes at genome-wide association study (GWAS) loci. We interrogated the genetics of gene expression of cultured human fetal RPE (fRPE) cells under two metabolic conditions and discovered hundreds of shared or condition-specific expression or splice quantitative trait loci (e/sQTLs). Co-localizations of fRPE e/sQTLs with age-related macular degeneration (AMD) and myopia GWAS data suggest new candidate genes, and mechanisms by which a common RDH5 allele contributes to both increased AMD risk and decreased myopia risk. Our study highlights the unique transcriptomic characteristics of fRPE and provides a resource to connect e/sQTLs in a critical ocular cell type to monogenic and complex eye disorders.

View details for DOI 10.1038/s42003-019-0430-6

View details for PubMedID 31925026
Abundant associations with gene expression complicate GWAS follow-up NATURE GENETICS Liu, B., Gloudemans, M. J., Rao, A. S., Ingelsson, E., Montgomery, S. B. 2019; 51 (5): 768-+

View details for DOI 10.1038/s41588-019-0404-0

View details for Web of Science ID 000466842000002
Identification of 22 novel loci associated with urinary biomarkers of albumin, sodium, and potassium excretion KIDNEY INTERNATIONAL Zanetti, D., Rao, A., Gustafsson, S., Assimes, T. L., Montgomery, S. B., Ingelsson, E. 2019; 95 (5): 1197–1208

View details for DOI 10.1016/j.kint.2018.12.017

View details for Web of Science ID 000465213400023
Transcriptional and Position Effect Contributions to rAAV-Mediated Gene Targeting Spector, L. P., Tiffany, M., Ferraro, N. M., Abell, N. S., Montgomery, S. B., Kay, M. A. CELL PRESS. 2019: 294

View details for Web of Science ID 000464381003086
Proficiency Testing of Standardized Samples Shows Very High Interlaboratory Agreement for Clinical Next-Generation Sequencing-Based Oncology Assays ARCHIVES OF PATHOLOGY & LABORATORY MEDICINE Merker, J. D., Devereaux, K., Iafrate, A., Kamel-Reid, S., Kim, A. S., Moncur, J. T., Montgomery, S. B., Nagarajan, R., Portier, B. P., Routbort, M. J., Smail, C., Surrey, L. F., Vasalos, P., Lazar, A. J., Lindeman, N. 2019; 143 (4): 463–71

View details for DOI 10.5858/arpa.2018-0336-CP

View details for Web of Science ID 000462602800014
A toolkit for genetics providers in follow-up of patients with non-diagnostic exome sequencing JOURNAL OF GENETIC COUNSELING Zastrow, D. B., Kohler, J. N., Bonner, D., Reuter, C. M., Fernandez, L., Grove, M. E., Fisk, D. G., Yang, Y., Eng, C. M., Ward, P. A., Bick, D., Worthey, E. A., Fisher, P. G., Ashley, E. A., Bernstein, J. A., Wheeler, M. T., Adams, D. R., Aday, A., Alejandro, M. E., Allard, P., Ashley, E. A., Azamian, M. S., Bacino, C. A., Baker, E., Balasubramanyam, A., Barseghyan, H., Batzli, G. F., Beggs, A. H., Behnam, B., Bellen, H. J., Bernstein, J. A., Bican, A., Bick, D. P., Birch, C. L., Boone, B. E., Bostwick, B. L., Briere, L. C., Brokamp, E., Brown, D. M., Brush, M., Burke, E. A., Burrage, L. C., Butte, M. J., Chen, S., Clark, G. D., Coakley, T. R., Cogan, J. D., Colley, H. A., Cooper, C. M., Cope, H., Craigen, W. J., D'Souza, P., Davids, M., Dayal, J. G., Dell'Angelica, E. C., Dhar, S. U., Dipple, K. M., Donnell-Fink, L. A., Dorrani, N., Dorset, D. C., Douine, E. D., Draper, D. D., Dries, A. M., Eckstein, D. J., Emrick, L. T., Eng, C. M., Enns, G. M., Eskin, A., Esteves, C., Estwick, T., Fairbrother, L., Ferreira, C., Fieg, E. L., Fisher, P. G., Fogel, B. L., Gahl, W. A., Glanton, E., Godfrey, R. A., Goldman, A. M., Goldstein, D. B., Gould, S. E., Gourdine, J. F., Groden, C. A., Gropman, A. L., Haendel, M., Hamid, R., Hanchard, N. A., High, F., Holm, I. A., Hom, J., Howerton, E. M., Huang, Y., Jamal, F., Jiang, Y., Johnston, J. M., Jones, A. L., Karaviti, L., Koeller, D. M., Kohane, I. S., Krasnewich, D. M., Korrick, S., Koziura, M., Krier, J. B., Kyle, J. E., Lalani, S. R., Lau, C., Lazar, J., LeBlanc, K., Lee, B. H., Lee, H., Levy, S. E., Lewis, R. A., Lincoln, S. A., Loo, S. K., Loscalzo, J., Maas, R. L., Macnamara, E. F., MacRae, C. A., Maduro, V. V., Majcherska, M. M., Malicdan, M. V., Mamounas, L. A., Manolio, T. A., Markello, T. C., Marom, R., Martin, G., Martinez-Agosto, J. A., Marwaha, S., May, T., McConkie-Rosell, A., McCormack, C. E., McCray, A. T., Merker, J. D., Metz, T. O., Might, M., Moretti, P. M., Morimoto, M., Nehrebecky, M. E., Nelson, S. F., Newberry, J., Newman, J. H., Nicholas, S. K., Novacic, D., Orange, J. S., Orengo, J. P., Pallais, J., Palmer, C. G. S., Papp, J. C., Postlethwait, J. H., Potocki, L., Pusey, B. N., Rives, L., Robertson, A. K., Rodan, L. H., Rosenfeld, J. A., Sampson, J. B., Samson, S. L., Schoch, K., Scott, D. A., Shakachite, L., Sharma, P., Shashi, V., Signer, R., Silverman, E. K., Sinsheimer, J. S., Smith, K. S., Spillmann, R. C., Stoler, J. M., Stong, N., Sullivan, J. A., Sweetser, D. A., Tan, Q., Tifft, C. J., Toro, C., Tran, A. A., Urv, T. K., Vilain, E., Vogel, T. P., Waggott, D. M., Wahl, C. E., Walley, N. M., Walsh, C. A., Walker, M., Wan, J., Wangler, M. F., Ward, P. A., Waters, K. M., Webb-Robertson, B. M., Westerfield, M., Wheeler, M. T., Wise, A. L., Wolfe, L. A., Worthey, E. A., Yamamoto, S., Yang, J., Yang, Y., Yoon, A. J., Yu, G., Zhao, C., Zheng, A., Undiagnosed Dis Network 2019; 28 (2): 213–28

View details for DOI 10.1002/jgc4.1119

View details for Web of Science ID 000463993600005
Identification of 22 novel loci associated withurinary biomarkers of albumin, sodium, andpotassium excretion. Kidney international Zanetti, D., Rao, A., Gustafsson, S., Assimes, T. L., Montgomery, S. B., Ingelsson, E. 2019

Abstract

Urine biomarkers reflecting kidney function and handling of dietary sodium and potassium are strongly associated with several common diseases including chronic kidney disease, cardiovascular disease, and diabetes mellitus. Knowledge about the genetic determinants of these biomarkers may shed light on pathophysiological mechanisms underlying the development of these diseases. We performed genome-wide association studies of urinary albumin: creatinine ratio (UACR), urinary potassium: creatinine ratio (UK/UCr), urinary sodium: creatinine ratio (UNa/UCr) and urinary sodium: potassium ratio (UNa/UK) in up to 218,450 (discovery) and 109,166 (replication) unrelated individuals of European ancestry from the UK Biobank. Further, we explored genetic correlations, tissue-specific gene expression, and possible genes implicated in the regulation of these biomarkers. After replication, we identified 19 genome-wide significant independent loci associated with UACR, 6 each with UK/UCr and UNa/UCr, and 4 with UNa/UK. In addition to 22 novel associations, we confirmed several established associations, including between the CUBN locus and microalbuminuria. We detected high pairwise genetic correlation across the urinary biomarkers, and between their levels and several physiological measurements. We highlight GIPR, a potential diabetes drug target, as possibly implicated in the genetic control of urinary potassium excretion, and NRBP1, a locus associated with gout, as plausibly involved in sodium and albumin excretion. Overall, we identified 22 novel genome-wide significant associations with urinary biomarkers and confirmed several previously established associations, providing new insights into the genetic basis of these traits and their connection to chronic diseases.

View details for PubMedID 30910378
Abundant associations with gene expression complicate GWAS follow-up. Nature genetics Liu, B. n., Gloudemans, M. J., Rao, A. S., Ingelsson, E. n., Montgomery, S. B. 2019; 51 (5): 768–69

View details for PubMedID 31043754
SEX DIFFERENCES AT THE MOLECULAR LEVEL: LESSONS FROM THE HUMAN TRANSCRIPTOME Stranger, B., Oliva, M., Gamazon, E., Reverter, F., Wucher, V., Balliu, B., Dumitrascu, B., Parsana, P., Payne, A., Jo, B., Montgomery, S., Battle, A., Ardlie, K., Guigo, R., Engelhardt, B. ELSEVIER. 2019: 1034

View details for DOI 10.1016/j.euroneuro.2018.07.028

View details for Web of Science ID 000477708400028
A toolkit for genetics providers in follow-up of patients with non-diagnostic exome sequencing. Journal of genetic counseling Zastrow, D. B., Kohler, J. N., Bonner, D. n., Reuter, C. M., Fernandez, L. n., Grove, M. E., Fisk, D. G., Yang, Y. n., Eng, C. M., Ward, P. A., Bick, D. n., Worthey, E. A., Fisher, P. G., Ashley, E. A., Bernstein, J. A., Wheeler, M. T. 2019; 28 (2): 213–28

Abstract

There are approximately 7,000 rare diseases affecting 25-30 million Americans, with 80% estimated to have a genetic basis. This presents a challenge for genetics practitioners to determine appropriate testing, make accurate diagnoses, and conduct up-to-date patient management. Exome sequencing (ES) is a comprehensive diagnostic approach, but only 25%-41% of the patients receive a molecular diagnosis. The remaining three-fifths to three-quarters of patients undergoing ES remain undiagnosed. The Stanford Center for Undiagnosed Diseases (CUD), a clinical site of the Undiagnosed Diseases Network, evaluates patients with undiagnosed and rare diseases using a combination of methods including ES. Frequently these patients have non-diagnostic ES results, but strategic follow-up techniques identify diagnoses in a subset. We present techniques used at the CUD that can be adopted by genetics providers in clinical follow-up of cases where ES is non-diagnostic. Solved case examples illustrate different types of non-diagnostic results and the additional techniques that led to a diagnosis. Frequent approaches include segregation analysis, data reanalysis, genome sequencing, additional variant identification, careful phenotype-disease correlation, confirmatory testing, and case matching. We also discuss prioritization of cases for additional analyses.

View details for PubMedID 30964584
Genetic analyses of human fetal retinal pigment epithelium gene expression suggest ocular disease mechanisms. Communications biology Liu, B., Calton, M. A., Abell, N. S., Benchorin, G., Gloudemans, M. J., Chen, M., Hu, J., Li, X., Balliu, B., Bok, D., Montgomery, S. B., Vollrath, D. 2019; 2: 186

Abstract

The retinal pigment epithelium (RPE) serves vital roles in ocular development and retinal homeostasis but has limited representation in large-scale functional genomics datasets. Understanding how common human genetic variants affect RPE gene expression could elucidate the sources of phenotypic variability in selected monogenic ocular diseases and pinpoint causal genes at genome-wide association study (GWAS) loci. We interrogated the genetics of gene expression of cultured human fetal RPE (fRPE) cells under two metabolic conditions and discovered hundreds of shared or condition-specific expression or splice quantitative trait loci (e/sQTLs). Co-localizations of fRPE e/sQTLs with age-related macular degeneration (AMD) and myopia GWAS data suggest new candidate genes, and mechanisms by which a common RDH5 allele contributes to both increased AMD risk and decreased myopia risk. Our study highlights the unique transcriptomic characteristics of fRPE and provides a resource to connect e/sQTLs in a critical ocular cell type to monogenic and complex eye disorders.

View details for DOI 10.1038/s42003-019-0430-6

View details for PubMedID 31123710
Pathologic gene network rewiring implicates PPP1R3A as a central regulator in pressure overload heart failure. Nature communications Cordero, P., Parikh, V. N., Chin, E. T., Erbilgin, A., Gloudemans, M. J., Shang, C., Huang, Y., Chang, A. C., Smith, K. S., Dewey, F., Zaleta, K., Morley, M., Brandimarto, J., Glazer, N., Waggott, D., Pavlovic, A., Zhao, M., Moravec, C. S., Tang, W. H., Skreen, J., Malloy, C., Hannenhalli, S., Li, H., Ritter, S., Li, M., Bernstein, D., Connolly, A., Hakonarson, H., Lusis, A. J., Margulies, K. B., Depaoli-Roach, A. A., Montgomery, S. B., Wheeler, M. T., Cappola, T., Ashley, E. A. 2019; 10 (1): 2760

Abstract

Heart failure is a leading cause of mortality, yet our understanding of the genetic interactions underlying this disease remains incomplete. Here, we harvest 1352 healthy and failing human hearts directly from transplant center operating rooms, and obtain genome-wide genotyping and gene expression measurements for a subset of 313. We build failing and non-failing cardiac regulatory gene networks, revealing important regulators and cardiac expression quantitative trait loci (eQTLs). PPP1R3A emerges as a regulator whose network connectivity changes significantly between health and disease. RNA sequencing after PPP1R3A knockdown validates network-based predictions, and highlights metabolic pathway regulation associated with increased cardiomyocyte size and perturbed respiratory metabolism. Mice lacking PPP1R3A are protected against pressure-overload heart failure. We present a global gene interaction map of the human heart failure transition, identify previously unreported cardiac eQTLs, and demonstrate the discovery potential of disease-specific networks through the description of PPP1R3A as a central regulator in heart failure.

View details for DOI 10.1038/s41467-019-10591-5

View details for PubMedID 31235787
Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nature medicine Frésard, L. n., Smail, C. n., Ferraro, N. M., Teran, N. A., Li, X. n., Smith, K. S., Bonner, D. n., Kernohan, K. D., Marwaha, S. n., Zappala, Z. n., Balliu, B. n., Davis, J. R., Liu, B. n., Prybol, C. J., Kohler, J. N., Zastrow, D. B., Reuter, C. M., Fisk, D. G., Grove, M. E., Davidson, J. M., Hartley, T. n., Joshi, R. n., Strober, B. J., Utiramerur, S. n., Lind, L. n., Ingelsson, E. n., Battle, A. n., Bejerano, G. n., Bernstein, J. A., Ashley, E. A., Boycott, K. M., Merker, J. D., Wheeler, M. T., Montgomery, S. B. 2019

Abstract

It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.

View details for DOI 10.1038/s41591-019-0457-8

View details for PubMedID 31160820
Diagnosing rare diseases after the exome. Cold Spring Harbor molecular case studies Fresard, L., Montgomery, S. B. 2018; 4 (6)

Abstract

High-throughput sequencing has ushered in a diversity of approaches for identifying genetic variants and understanding genome structure and function. When applied to individuals with rare genetic diseases, these approaches have greatly accelerated gene discovery and patient diagnosis. Over the past decade, exome sequencing has emerged as a comprehensive and cost-effective approach to identify pathogenic variants in the protein-coding regions of the genome. However, for individuals in whom exome-sequencing fails to identify a pathogenic variant, we discuss recent advances that are helping to reduce the diagnostic gap.

View details for PubMedID 30559314
Proficiency Testing of Standardized Samples Shows Very High Interlaboratory Agreement for Clinical Next-Generation Sequencing-Based Oncology Assays. Archives of pathology & laboratory medicine Merker, J. D., Devereaux, K., Iafrate, A. J., Kamel-Reid, S., Kim, A. S., Moncur, J. T., Montgomery, S. B., Nagarajan, R., Portier, B. P., Routbort, M. J., Smail, C., Surrey, L. F., Vasalos, P., Lazar, A. J., Lindeman, N. I. 2018

Abstract

CONTEXT.: Next-generation sequencing-based assays are being increasingly used in the clinical setting for the detection of somatic variants in solid tumors, but limited data are available regarding the interlaboratory performance of these assays.OBJECTIVE.: To examine proficiency testing data from the initial College of American Pathologists (CAP) Next-Generation Sequencing Solid Tumor survey to report on laboratory performance.DESIGN.: CAP proficiency testing results from 111 laboratories were analyzed for accuracy and associated assay performance characteristics.RESULTS.: The overall accuracy observed for all variants was 98.3%. Rare false-negative results could not be attributed to sequencing platform, selection method, or other assay characteristics. The median and average of the variant allele fractions reported by the laboratories were within 10% of those orthogonally determined by digital polymerase chain reaction for each variant. The median coverage reported at the variant sites ranged from 1922 to 3297.CONCLUSIONS.: Laboratories demonstrated an overall accuracy of greater than 98% with high specificity when examining 10 clinically relevant somatic single-nucleotide variants with a variant allele fraction of 15% or greater. These initial data suggest excellent performance, but further ongoing studies are needed to evaluate the performance of lower variant allele fractions and additional variant types.

View details for PubMedID 30376374
Genetic Regulatory Mechanisms of Smooth Muscle Cells Map to Coronary Artery Disease Risk Loci AMERICAN JOURNAL OF HUMAN GENETICS Liu, B., Pjanic, M., Wang, T., Nguyen, T., Gloudemans, M., Rao, A., Castano, V. G., Nurnberg, S., Rader, D. J., Elwyn, S., Ingelsson, E., Montgomery, S. B., Miller, C. L., Quertermous, T. 2018; 103 (3): 377–88

View details for DOI 10.1016/j.ajhg.2018.08.001

View details for Web of Science ID 000443819500007
Large-Scale Phenome-Wide Association Study of PCSK9 Variants Demonstrates Protection Against Ischemic Stroke CIRCULATION-GENOMIC AND PRECISION MEDICINE Rao, A. S., Lindholm, D., Rivas, M. A., Knowles, J. W., Montgomery, S. B., Ingelsson, E. 2018; 11 (7): e002162

Abstract

PCSK9 inhibition is a potent new therapy for hypercholesterolemia and cardiovascular disease. Although short-term clinical trial results have not demonstrated major adverse effects, long-term data will not be available for some time. Genetic studies in large biobanks offer a unique opportunity to predict drug effects and provide context for the evaluation of future clinical trial outcomes.We tested the association of the PCSK9 missense variant rs11591147 with predefined phenotypes and phenome-wide, in 337 536 individuals of British ancestry in the UK Biobank, with independent discovery and replication. Using a Bayesian statistical method, we leveraged phenotype correlations to evaluate the phenome-wide impact of PCSK9 inhibition with higher power at a finer resolution.The T allele of rs11591147 showed a protective effect on hyperlipidemia (odds ratio, 0.63±0.04; P=2.32×10-38), coronary heart disease (odds ratio, 0.73±0.09; P=1.05×10-6), and ischemic stroke (odds ratio, 0.61±0.18; P=2.40×10-3) and was associated with increased type 2 diabetes mellitus risk adjusted for lipid-lowering medication status (odds ratio, 1.24±0.10; P=1.98×10-7). We did not observe associations with cataracts, heart failure, atrial fibrillation, and cognitive dysfunction. Leveraging phenotype correlations, we observed evidence of a protective association with cerebral infarction and vascular occlusion. These results explore the effects of direct PCSK9 inhibition; off-target effects cannot be predicted using this approach.This result represents the first genetic evidence in a large cohort for the protective effect of PCSK9 inhibition on ischemic stroke and corroborates exploratory evidence from clinical trials. PCSK9 inhibition was not associated with variables other than those related to LDL (low-density lipoprotein) cholesterol, atherosclerosis, and type 2 diabetes mellitus, suggesting that other effects are either small or absent.

View details for PubMedID 29997226
Ubiquitination of ABCE1 by NOT4 in Response to Mitochondrial Damage Links Co-translational Quality Control to PINK1-Directed Mitophagy. Cell metabolism Wu, Z., Wang, Y., Lim, J., Liu, B., Li, Y., Vartak, R., Stankiewicz, T., Montgomery, S., Lu, B. 2018

Abstract

Translation of mRNAs is tightly regulated and constantly surveyed for errors. Aberrant translation can trigger co-translational protein and RNA quality control processes, impairments of which cause neurodegeneration by still poorly understood mechanism(s). Here we show that quality control of translation of mitochondrial outer membrane (MOM)-localized mRNA intersects with the turnover of damaged mitochondria, both orchestrated by the mitochondrial kinase PINK1. Mitochondrial damage causes stalled translation of complex-I 30 kDa subunit (C-I30) mRNA on MOM, triggering the recruitment of co-translational quality control factors Pelo, ABCE1, and NOT4 to the ribosome/mRNA-ribonucleoprotein complex. Damage-induced ubiquitination of ABCE1 by NOT4 generates poly-ubiquitin signals that attract autophagy receptors to MOM to initiate mitophagy. In the Drosophila PINK1 model, these factors act synergistically to restore mitophagy and neuromuscular tissue integrity. Thus ribosome-associated co-translational quality control generates an early signal to trigger mitophagy. Our results have broad therapeutic implications for the understanding and treatment of neurodegenerative diseases.

View details for PubMedID 29861391
Recurrently Mutated Genes Differ between Leptomeningeal and Solid Lung Cancer Brain Metastases. Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer Li, Y., Liu, B., Connolly, I. D., Kakusa, B. W., Pan, W., Nagpal, S., Montgomery, S. B., Hayden Gephart, M. 2018

Abstract

When compared with solid brain metastases from NSCLC, leptomeningeal disease (LMD) has unique growth patterns and is rapidly fatal. Patients with LMD do not undergo surgical resection, limiting the tissue available for scientific research. In this study we performed whole exome sequencing on eight samples of LMD to identify somatic mutations and compared the results with those for 26 solid brain metastases. We found that taste 2 receptor member 31 gene (TAS2R31) and phosphodiesterase 4D interacting protein gene (PDE4DIP) were recurrently mutated among LMD samples, suggesting involvement in LMD progression. Together with a retrospective review of the charts of an additional 44 patients with NSCLC LMD, we discovered a surprisingly low number of KRAS mutations (n= 4 [7.7%]) but a high number of EGFR mutations (n= 33 [63.5%]). The median interval for development of LMD from NSCLC was shorter in patients with mutant EGFR (16.3 months) than in patients with wild-type EGFR (23.9 months) (p= 0.017). Targeted analysis of recurrent mutations thus presents a useful complement to the existing diagnostic tool kit, and correlations of EGFR in LMD and KRAS in solid metastases suggest that molecular distinctions or systemic treatment pressure underpin the differences in growth patterns within the brain.

View details for PubMedID 29604399
Biallelic Mutations in ATP5F1D, which Encodes a Subunit of ATP Synthase, Cause a Metabolic Disorder AMERICAN JOURNAL OF HUMAN GENETICS Olahova, M., Yoon, W., Thompson, K., Jangam, S., Fernandez, L., Davidson, J. M., Kyle, J. E., Grove, M. E., Fisk, D. G., Kohler, J. N., Holmes, M., Dries, A. M., Huang, Y., Zhao, C., Contrepois, K., Zappala, Z., Fresard, L., Waggott, D., Zink, E. M., Kim, Y., Heyman, H. M., Stratton, K. G., Webb-Robertson, B. M., Snyder, M., Merker, J. D., Montgomery, S. B., Fisher, P. G., Feichtinger, R. G., Mayr, J. A., Hall, J., Barbosa, I. A., Simpson, M. A., Deshpande, C., Waters, K. M., Koeller, D. M., Metz, T. O., Morris, A. A., Schelley, S., Cowan, T., Friederich, M. W., McFarland, R., Van Hove, J. L. K., Enns, G. M., Yamamoto, S., Ashley, E. A., Wangler, M. F., Taylor, R. W., Bellen, H. J., Bernstein, J. A., Wheeler, M. T., Undiagnosed Diseases Network 2018; 102 (3): 494–504

Abstract

ATP synthase, H+ transporting, mitochondrial F1 complex, δ subunit (ATP5F1D; formerly ATP5D) is a subunit of mitochondrial ATP synthase and plays an important role in coupling proton translocation and ATP production. Here, we describe two individuals, each with homozygous missense variants in ATP5F1D, who presented with episodic lethargy, metabolic acidosis, 3-methylglutaconic aciduria, and hyperammonemia. Subject 1, homozygous for c.245C>T (p.Pro82Leu), presented with recurrent metabolic decompensation starting in the neonatal period, and subject 2, homozygous for c.317T>G (p.Val106Gly), presented with acute encephalopathy in childhood. Cultured skin fibroblasts from these individuals exhibited impaired assembly of F1FO ATP synthase and subsequent reduced complex V activity. Cells from subject 1 also exhibited a significant decrease in mitochondrial cristae. Knockdown of Drosophila ATPsynδ, the ATP5F1D homolog, in developing eyes and brains caused a near complete loss of the fly head, a phenotype that was fully rescued by wild-type human ATP5F1D. In contrast, expression of the ATP5F1D c.245C>T and c.317T>G variants rescued the head-size phenotype but recapitulated the eye and antennae defects seen in other genetic models of mitochondrial oxidative phosphorylation deficiency. Our data establish c.245C>T (p.Pro82Leu) and c.317T>G (p.Val106Gly) in ATP5F1D as pathogenic variants leading to a Mendelian mitochondrial disease featuring episodic metabolic decompensation.

View details for PubMedID 29478781
Genetic Regulatory Mechanisms of Smooth Muscle Cells Map to Coronary Artery Disease Risk Loci. American journal of human genetics Liu, B. n., Pjanic, M. n., Wang, T. n., Nguyen, T. n., Gloudemans, M. n., Rao, A. n., Castano, V. G., Nurnberg, S. n., Rader, D. J., Elwyn, S. n., Ingelsson, E. n., Montgomery, S. B., Miller, C. L., Quertermous, T. n. 2018

Abstract

Coronary artery disease (CAD) is the leading cause of death globally. Genome-wide association studies (GWASs) have identified more than 95 independent loci that influence CAD risk, most of which reside in non-coding regions of the genome. To interpret these loci, we generated transcriptome and whole-genome datasets using human coronary artery smooth muscle cells (HCASMCs) from 52 unrelated donors, as well as epigenomic datasets using ATAC-seq on a subset of 8 donors. Through systematic comparison with publicly available datasets from GTEx and ENCODE projects, we identified transcriptomic, epigenetic, and genetic regulatory mechanisms specific to HCASMCs. We assessed the relevance of HCASMCs to CAD risk using transcriptomic and epigenomic level analyses. By jointly modeling eQTL and GWAS datasets, we identified five genes (SIPA1, TCF21, SMAD3, FES, and PDGFRA) that may modulate CAD risk through HCASMCs, all of which have relevant functional roles in vascular remodeling. Comparison with GTEx data suggests that SIPA1 and PDGFRA influence CAD risk predominantly through HCASMCs, while other annotated genes may have multiple cell and tissue targets. Together, these results provide tissue-specific and mechanistic insights into the regulation of a critical vascular cell type associated with CAD in human populations.

View details for PubMedID 30146127
Functional regulatory mechanism of smooth muscle cell-restricted LMOD1 coronary artery disease locus. PLoS genetics Nanda, V. n., Wang, T. n., Pjanic, M. n., Liu, B. n., Nguyen, T. n., Matic, L. P., Hedin, U. n., Koplev, S. n., Ma, L. n., Franzén, O. n., Ruusalepp, A. n., Schadt, E. E., Björkegren, J. L., Montgomery, S. B., Snyder, M. P., Quertermous, T. n., Leeper, N. J., Miller, C. L. 2018; 14 (11): e1007755

Abstract

Recent genome-wide association studies (GWAS) have identified multiple new loci which appear to alter coronary artery disease (CAD) risk via arterial wall-specific mechanisms. One of the annotated genes encodes LMOD1 (Leiomodin 1), a member of the actin filament nucleator family that is highly enriched in smooth muscle-containing tissues such as the artery wall. However, it is still unknown whether LMOD1 is the causal gene at this locus and also how the associated variants alter LMOD1 expression/function and CAD risk. Using epigenomic profiling we recently identified a non-coding regulatory variant, rs34091558, which is in tight linkage disequilibrium (LD) with the lead CAD GWAS variant, rs2820315. Herein we demonstrate through expression quantitative trait loci (eQTL) and statistical fine-mapping in GTEx, STARNET, and human coronary artery smooth muscle cell (HCASMC) datasets, rs34091558 is the top regulatory variant for LMOD1 in vascular tissues. Position weight matrix (PWM) analyses identify the protective allele rs34091558-TA to form a conserved Forkhead box O3 (FOXO3) binding motif, which is disrupted by the risk allele rs34091558-A. FOXO3 chromatin immunoprecipitation and reporter assays show reduced FOXO3 binding and LMOD1 transcriptional activity by the risk allele, consistent with effects of FOXO3 downregulation on LMOD1. LMOD1 knockdown results in increased proliferation and migration and decreased cell contraction in HCASMC, and immunostaining in atherosclerotic lesions in the SMC lineage tracing reporter mouse support a key role for LMOD1 in maintaining the differentiated SMC phenotype. These results provide compelling functional evidence that genetic variation is associated with dysregulated LMOD1 expression/function in SMCs, together contributing to the heritable risk for CAD.

View details for PubMedID 30444878
Allele-specific expression reveals interactions between genetic variation and environment. Nature methods Knowles, D. A., Davis, J. R., Edgington, H., Raj, A., Favé, M., Zhu, X., Potash, J. B., Weissman, M. M., Shi, J., Levinson, D. F., Awadalla, P., Mostafavi, S., Montgomery, S. B., Battle, A. 2017

Abstract

Identifying interactions between genetics and the environment (GxE) remains challenging. We have developed EAGLE, a hierarchical Bayesian model for identifying GxE interactions based on associations between environmental variables and allele-specific expression. Combining whole-blood RNA-seq with extensive environmental annotations collected from 922 human individuals, we identified 35 GxE interactions, compared with only four using standard GxE interaction testing. EAGLE provides new opportunities for researchers to identify GxE interactions using functional genomic data.

View details for DOI 10.1038/nmeth.4298

View details for PubMedID 28530654
Population- and individual- specific regulatory variation in Sardinia NATURE GENETICS Pala, M., Zappala, Z., Marongiu, M., Li, X., Davis, J. R., Cusano, R., Crobu, F., Kukurba, K. R., Gloudemans, M. J., Reinier, F., Berutti, R., Piras, M. G., Mulas, A., Zoledziewska, M., Marongiu, M., Sorokin, E. P., Hess, G. T., Smith, K. S., Busonero, F., Maschio, A., Steri, M., Sidore, C., Sanna, S., Fiorillo, E., Bassik, M. C., Sawcer, S. J., Battle, A., Novembre, J., Jones, C., Angius, A., Abecasis, G. R., Schlessinger, D., Cucca, F., Montgomery, S. B. 2017; 49 (5): 700-?

Abstract

Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.

View details for DOI 10.1038/ng.3840

View details for Web of Science ID 000400051400010

View details for PubMedID 28394350
The impact of structural variation on human gene expression NATURE GENETICS Chiang, C., Scott, A. J., Davis, J. R., Tsang, E. K., Li, X., Kim, Y., Hadzic, T., Damani, F. N., Ganel, L., Montgomery, S. B., Battle, A., Conrad, D. F., Hall, I. M. 2017; 49 (5): 692-?

Abstract

Structural variants (SVs) are an important source of human genetic diversity, but their contribution to traits, disease and gene regulation remains unclear. We mapped cis expression quantitative trait loci (eQTLs) in 13 tissues via joint analysis of SVs, single-nucleotide variants (SNVs) and short insertion/deletion (indel) variants from deep whole-genome sequencing (WGS). We estimated that SVs are causal at 3.5-6.8% of eQTLs-a substantially higher fraction than prior estimates-and that expression-altering SVs have larger effect sizes than do SNVs and indels. We identified 789 putative causal SVs predicted to directly alter gene expression: most (88.3%) were noncoding variants enriched at enhancers and other regulatory elements, and 52 were linked to genome-wide association study loci. We observed a notable abundance of rare high-impact SVs associated with aberrant expression of nearby genes. These results suggest that comprehensive WGS-based SV analyses will increase the power of common- and rare-variant association studies.

View details for DOI 10.1038/ng.3834

View details for PubMedID 28369037
Overexpression of the Cytokine BAFF and Autoimmunity Risk NEW ENGLAND JOURNAL OF MEDICINE Steri, M., Orru, V., Idda, M. L., Pitzalis, M., Pala, M., Zara, I., Sidore, C., Faa, V., Floris, M., Deiana, M., Asunis, I., Porcu, E., Mulas, A., PIRAS, M. G., Lobina, M., Lai, S., Marongiu, M., Serra, V., Marongiu, M., Sole, G., Busonero, F., Maschio, A., Cusano, R., Cuccuru, G., Deidda, F., Poddie, F., Farina, G., Dei, M., VIRDIS, F., Olla, S., Satta, M. A., Pani, M., Delitala, A., Cocco, E., Frau, J., Coghe, G., Lorefice, L., Fenu, G., Ferrigno, P., Ban, M., Barizzone, N., Leone, M., Guerini, F. R., Piga, M., Firinu, D., Kockum, I., Bomfim, I. L., Olsson, T., Alfredsson, L., Suarez, A., Carreira, P. E., Castillo-Palma, M. J., MARCUS, J. H., Congia, M., Angius, A., Melis, M., Gonzalez, A., Riquelme, M. E., da Silva, B. M., Marchini, M., DANIELI, M. G., Del Giacco, S., Mathieu, A., Pani, A., Montgomery, S. B., Rosati, G., Hillert, J., Sawcer, S., D'Alfonso, S., Todd, J. A., Novembre, J., Abecasis, G. R., Whalen, M. B., Marrosu, M. G., Meloni, A., Sanna, S., Gorospe, M., Schlessinger, D., Fiorillo, E., Zoledziewska, M., Cucca, F. 2017; 376 (17): 1615-1626

Abstract

Genomewide association studies of autoimmune diseases have mapped hundreds of susceptibility regions in the genome. However, only for a few association signals has the causal gene been identified, and for even fewer have the causal variant and underlying mechanism been defined. Coincident associations of DNA variants affecting both the risk of autoimmune disease and quantitative immune variables provide an informative route to explore disease mechanisms and drug-targetable pathways.Using case-control samples from Sardinia, Italy, we performed a genomewide association study in multiple sclerosis followed by TNFSF13B locus-specific association testing in systemic lupus erythematosus (SLE). Extensive phenotyping of quantitative immune variables, sequence-based fine mapping, cross-population and cross-phenotype analyses, and gene-expression studies were used to identify the causal variant and elucidate its mechanism of action. Signatures of positive selection were also investigated.A variant in TNFSF13B, encoding the cytokine and drug target B-cell activating factor (BAFF), was associated with multiple sclerosis as well as SLE. The disease-risk allele was also associated with up-regulated humoral immunity through increased levels of soluble BAFF, B lymphocytes, and immunoglobulins. The causal variant was identified: an insertion-deletion variant, GCTGT→A (in which A is the risk allele), yielded a shorter transcript that escaped microRNA inhibition and increased production of soluble BAFF, which in turn up-regulated humoral immunity. Population genetic signatures indicated that this autoimmunity variant has been evolutionarily advantageous, most likely by augmenting resistance to malaria.A TNFSF13B variant was associated with multiple sclerosis and SLE, and its effects were clarified at the population, cellular, and molecular levels. (Funded by the Italian Foundation for Multiple Sclerosis and others.).

View details for DOI 10.1056/NEJMoa1610528

View details for Web of Science ID 000400071900005
PML nuclear bodies contribute to the basal expression of the mTOR inhibitor DDIT4 SCIENTIFIC REPORTS Salsman, J., Stathakis, A., Parker, E., Chung, D., Anthes, L. E., Koskowich, K. L., Lahsaee, S., Gaston, D., Kukurba, K. R., Smith, K. S., Chute, I. C., Leger, D., Frost, L. D., Montgomery, S. B., Lewis, S. M., Eskiw, C., Dellaire, G. 2017; 7

Abstract

The promyelocytic leukemia (PML) protein is an essential component of PML nuclear bodies (PML NBs) frequently lost in cancer. PML NBs coordinate chromosomal regions via modification of nuclear proteins that in turn may regulate genes in the vicinity of these bodies. However, few PML NB-associated genes have been identified. PML and PML NBs can also regulate mTOR and cell fate decisions in response to cellular stresses. We now demonstrate that PML depletion in U2OS cells or TERT-immortalized normal human diploid fibroblasts results in decreased expression of the mTOR inhibitor DDIT4 (REDD1). DNA and RNA immuno-FISH reveal that PML NBs are closely associated with actively transcribed DDIT4 loci, implicating these bodies in regulation of basal DDIT4 expression. Although PML silencing did reduce the sensitivity of U2OS cells to metabolic stress induced by metformin, PML loss did not inhibit the upregulation of DDIT4 in response to metformin, hypoxia-like (CoCl2) or genotoxic stress. Analysis of publicly available cancer data also revealed a significant correlation between PML and DDIT4 expression in several cancer types (e.g. lung, breast, prostate). Thus, these findings uncover a novel mechanism by which PML loss may contribute to mTOR activation and cancer progression via dysregulation of basal DDIT4 gene expression.

View details for DOI 10.1038/srep45038

View details for Web of Science ID 000397135000001

View details for PubMedID 28332630
Whole transcriptome sequencing in blood provides a diagnosis of spinal muscular atrophy with progressive myoclonic epilepsy (SMA-PME). Human mutation Kernohan, K. D., Frésard, L., Zappala, Z., Hartley, T., Smith, K. S., Wagner, J., Xu, H., McBride, A., Bourque, P. R., Consortium, C. R., Bennett, S. A., Dyment, D. A., Boycott, K. M., Montgomery, S. B., Warman-Chardon, J. 2017

Abstract

At least 15% of the disease-causing mutations affect mRNA splicing. Many splicing mutations are missed in a clinical setting due to limitations of in silico prediction algorithms or their location in noncoding regions. Whole-transcriptome sequencing is a promising new tool to identify these mutations; however, it will be a challenge to obtain disease-relevant tissue for RNA. Here, we describe an individual with a sporadic atypical spinal muscular atrophy, in whom clinical DNA sequencing reported one pathogenic ASAH1 mutation (c.458A>G;p.Tyr153Cys). Transcriptome sequencing on patient leukocytes identified a highly significant and atypical ASAH1 isoform not explained by c.458A>G(p<10(-16) ). Subsequent Sanger-sequencing identified the splice mutation responsible for the isoform (c.504A>C;p.Lys168Asn) and provided a molecular diagnosis of autosomal-recessive spinal muscular atrophy with progressive myoclonic epilepsy. Our findings demonstrate the utility of RNA sequencing from blood to identify splice-impacting disease mutations for nonhematological conditions, providing a diagnosis for these otherwise unsolved patients.

View details for DOI 10.1002/humu.23211

View details for PubMedID 28251733
Small RNA Sequencing in Cells and Exosomes Identifies eQTLs and 14q32 as a Region of Active Export G3-GENES GENOMES GENETICS Tsang, E. K., Abell, N. S., Li, X., Anaya, V., Karczewski, K. J., Knowles, D. A., Sierra, R. G., Smith, K. S., Montgomery, S. B. 2017; 7 (1): 31-39

Abstract

Exosomes are small extracellular vesicles that carry heterogeneous cargo, including RNA, between cells. Increasing evidence suggests that exosomes are important mediators of intercellular communication and biomarkers of disease. Despite this, the variability of exosomal RNA between individuals has not been well quantified. To assess this variability, we sequenced the small RNA of cells and exosomes from a 17-member family. Across individuals, we show that selective export of miRNAs occurs not only at the level of specific transcripts, but that a cluster of 74 mature miRNAs on chromosome 14q32 is massively exported in exosomes while mostly absent from cells. We also observe more interindividual variability between exosomal samples than between cellular ones and identify four miRNA expression quantitative trait loci shared between cells and exosomes. Our findings indicate that genomically colocated miRNAs can be exported together and highlight the variability in exosomal miRNA levels between individuals as relevant for exosome use as diagnostics.

View details for DOI 10.1534/g3.116.036137

View details for Web of Science ID 000392200800003

View details for PubMedCentralID PMC5217120
FIRE: functional inference of genetic variants that regulate gene expression. Bioinformatics (Oxford, England) Ioannidis, N. M., Davis, J. R., DeGorter, M. K., Larson, N. B., McDonnell, S. K., French, A. J., Battle, A. J., Hastie, T. J., Thibodeau, S. N., Montgomery, S. B., Bustamante, C. D., Sieh, W. n., Whittemore, A. S. 2017; 33 (24): 3895–3901

Abstract

Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal genome analysis. One mechanism by which noncoding single nucleotide variants (SNVs) influence downstream phenotypes is through the regulation of gene expression. Methods to predict whether or not individual SNVs are likely to regulate gene expression would aid interpretation of variants of unknown significance identified in whole-genome sequencing studies.We developed FIRE (Functional Inference of Regulators of Expression), a tool to score both noncoding and coding SNVs based on their potential to regulate the expression levels of nearby genes. FIRE consists of 23 random forests trained to recognize SNVs in cis-expression quantitative trait loci (cis-eQTLs) using a set of 92 genomic annotations as predictive features. FIRE scores discriminate cis-eQTL SNVs from non-eQTL SNVs in the training set with a cross-validated area under the receiver operating characteristic curve (AUC) of 0.807, and discriminate cis-eQTL SNVs shared across six populations of different ancestry from non-eQTL SNVs with an AUC of 0.939. FIRE scores are also predictive of cis-eQTL SNVs across a variety of tissue types.FIRE scores for genome-wide SNVs in hg19/GRCh37 are available for download at https://sites.google.com/site/fireregulatoryvariation/.nilah@stanford.edu.Supplementary data are available at Bioinformatics online.

View details for PubMedID 28961785
Long-read genome sequencing identifies causal structural variation in a Mendelian disease. Genetics in medicine : official journal of the American College of Medical Genetics Merker, J. D., Wenger, A. M., Sneddon, T. n., Grove, M. n., Zappala, Z. n., Fresard, L. n., Waggott, D. n., Utiramerur, S. n., Hou, Y. n., Smith, K. S., Montgomery, S. B., Wheeler, M. n., Buchan, J. G., Lambert, C. C., Eng, K. S., Hickey, L. n., Korlach, J. n., Ford, J. n., Ashley, E. A. 2017

Abstract

PurposeCurrent clinical genomics assays primarily utilize short-read sequencing (SRS), but SRS has limited ability to evaluate repetitive regions and structural variants. Long-read sequencing (LRS) has complementary strengths, and we aimed to determine whether LRS could offer a means to identify overlooked genetic variation in patients undiagnosed by SRS.MethodsWe performed low-coverage genome LRS to identify structural variants in a patient who presented with multiple neoplasia and cardiac myxomata, in whom the results of targeted clinical testing and genome SRS were negative.ResultsThis LRS approach yielded 6,971 deletions and 6,821 insertions > 50 bp. Filtering for variants that are absent in an unrelated control and overlap a disease gene coding exon identified three deletions and three insertions. One of these, a heterozygous 2,184 bp deletion, overlaps the first coding exon of PRKAR1A, which is implicated in autosomal dominant Carney complex. RNA sequencing demonstrated decreased PRKAR1A expression. The deletion was classified as pathogenic based on guidelines for interpretation of sequence variants.ConclusionThis first successful application of genome LRS to identify a pathogenic variant in a patient suggests that LRS has significant potential for the identification of disease-causing structural variation. Larger studies will ultimately be required to evaluate the potential clinical utility of LRS.GENETICS in MEDICINE advance online publication, 22 June 2017; doi:10.1038/gim.2017.86.

View details for PubMedID 28640241
Overexpression of the Cytokine BAFF and Autoimmunity Risk. New England journal of medicine Steri, M., Orrù, V., Idda, M. L., Pitzalis, M., Pala, M., Zara, I., Sidore, C., Faà, V., Floris, M., Deiana, M., Asunis, I., Porcu, E., Mulas, A., Piras, M. G., Lobina, M., Lai, S., Marongiu, M., Serra, V., Marongiu, M., Sole, G., Busonero, F., Maschio, A., Cusano, R., Cuccuru, G., Deidda, F., Poddie, F., Farina, G., Dei, M., Virdis, F., Olla, S., Satta, M. A., Pani, M., Delitala, A., Cocco, E., Frau, J., Coghe, G., Lorefice, L., Fenu, G., Ferrigno, P., Ban, M., Barizzone, N., Leone, M., Guerini, F. R., Piga, M., Firinu, D., Kockum, I., Lima Bomfim, I., Olsson, T., Alfredsson, L., Suarez, A., Carreira, P. E., Castillo-Palma, M. J., Marcus, J. H., Congia, M., Angius, A., Melis, M., Gonzalez, A., Alarcón Riquelme, M. E., da Silva, B. M., Marchini, M., Danieli, M. G., Del Giacco, S., Mathieu, A., Pani, A., Montgomery, S. B., Rosati, G., Hillert, J., Sawcer, S., D'Alfonso, S., Todd, J. A., Novembre, J., Abecasis, G. R., Whalen, M. B., Marrosu, M. G., Meloni, A., Sanna, S., Gorospe, M., Schlessinger, D., Fiorillo, E., Zoledziewska, M., Cucca, F. 2017; 376 (17): 1615-1626

Abstract

Genomewide association studies of autoimmune diseases have mapped hundreds of susceptibility regions in the genome. However, only for a few association signals has the causal gene been identified, and for even fewer have the causal variant and underlying mechanism been defined. Coincident associations of DNA variants affecting both the risk of autoimmune disease and quantitative immune variables provide an informative route to explore disease mechanisms and drug-targetable pathways.Using case-control samples from Sardinia, Italy, we performed a genomewide association study in multiple sclerosis followed by TNFSF13B locus-specific association testing in systemic lupus erythematosus (SLE). Extensive phenotyping of quantitative immune variables, sequence-based fine mapping, cross-population and cross-phenotype analyses, and gene-expression studies were used to identify the causal variant and elucidate its mechanism of action. Signatures of positive selection were also investigated.A variant in TNFSF13B, encoding the cytokine and drug target B-cell activating factor (BAFF), was associated with multiple sclerosis as well as SLE. The disease-risk allele was also associated with up-regulated humoral immunity through increased levels of soluble BAFF, B lymphocytes, and immunoglobulins. The causal variant was identified: an insertion-deletion variant, GCTGT→A (in which A is the risk allele), yielded a shorter transcript that escaped microRNA inhibition and increased production of soluble BAFF, which in turn up-regulated humoral immunity. Population genetic signatures indicated that this autoimmunity variant has been evolutionarily advantageous, most likely by augmenting resistance to malaria.A TNFSF13B variant was associated with multiple sclerosis and SLE, and its effects were clarified at the population, cellular, and molecular levels. (Funded by the Italian Foundation for Multiple Sclerosis and others.).

View details for DOI 10.1056/NEJMoa1610528

View details for PubMedID 28445677
Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans. Genome medicine Gottlieb, A. n., Daneshjou, R. n., DeGorter, M. n., Bourgeois, S. n., Svensson, P. J., Wadelius, M. n., Deloukas, P. n., Montgomery, S. B., Altman, R. B. 2017; 9 (1): 98

Abstract

Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects.Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals.We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations.Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.

View details for PubMedID 29178968
Incorporation of Biological Knowledge Into the Study of Gene-Environment Interactions. American journal of epidemiology Ritchie, M. D., Davis, J. R., Aschard, H. n., Battle, A. n., Conti, D. n., Du, M. n., Eskin, E. n., Fallin, M. D., Hsu, L. n., Kraft, P. n., Moore, J. H., Pierce, B. L., Bien, S. A., Thomas, D. C., Wei, P. n., Montgomery, S. B. 2017; 186 (7): 771–77

Abstract

A growing knowledge base of genetic and environmental information has greatly enabled the study of disease risk factors. However, the computational complexity and statistical burden of testing all variants by all environments has required novel study designs and hypothesis-driven approaches. We discuss how incorporating biological knowledge from model organisms, functional genomics, and integrative approaches can empower the discovery of novel gene-environment interactions and discuss specific methodological considerations with each approach. We consider specific examples where the application of these approaches has uncovered effects of gene-environment interactions relevant to drug response and immunity, and we highlight how such improvements enable a greater understanding of the pathogenesis of disease and the realization of precision medicine.

View details for DOI 10.1093/aje/kwx229

View details for PubMedID 28978191
Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. American journal of epidemiology McAllister, K. n., Mechanic, L. E., Amos, C. n., Aschard, H. n., Blair, I. A., Chatterjee, N. n., Conti, D. n., Gauderman, W. J., Hsu, L. n., Hutter, C. M., Jankowska, M. M., Kerr, J. n., Kraft, P. n., Montgomery, S. B., Mukherjee, B. n., Papanicolaou, G. J., Patel, C. J., Ritchie, M. D., Ritz, B. R., Thomas, D. C., Wei, P. n., Witte, J. S. 2017; 186 (7): 753–61

Abstract

Recently, many new approaches, study designs, and statistical and analytical methods have emerged for studying gene-environment interactions (G×Es) in large-scale studies of human populations. There are opportunities in this field, particularly with respect to the incorporation of -omics and next-generation sequencing data and continual improvement in measures of environmental exposures implicated in complex disease outcomes. In a workshop called "Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases," held October 17-18, 2014, by the National Institute of Environmental Health Sciences and the National Cancer Institute in conjunction with the annual American Society of Human Genetics meeting, participants explored new approaches and tools that have been developed in recent years for G×E discovery. This paper highlights current and critical issues and themes in G×E research that need additional consideration, including the improved data analytical methods, environmental exposure assessment, and incorporation of functional data and annotations.

View details for DOI 10.1093/aje/kwx227

View details for PubMedID 28978193
Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. Nature genetics 2017; 49 (12): 1664–70

Abstract

Genetic variants have been associated with myriad molecular phenotypes that provide new insight into the range of mechanisms underlying genetic traits and diseases. Identifying any particular genetic variant's cascade of effects, from molecule to individual, requires assaying multiple layers of molecular complexity. We introduce the Enhancing GTEx (eGTEx) project that extends the GTEx project to combine gene expression with additional intermediate molecular measurements on the same tissues to provide a resource for studying how genetic differences cascade through molecular phenotypes to impact human health.

View details for DOI 10.1038/ng.3969

View details for PubMedID 29019975
The impact of rare variation on gene expression across tissues. Nature Li, X. n., Kim, Y. n., Tsang, E. K., Davis, J. R., Damani, F. N., Chiang, C. n., Hess, G. T., Zappala, Z. n., Strober, B. J., Scott, A. J., Li, A. n., Ganna, A. n., Bassik, M. C., Merker, J. D., Hall, I. M., Battle, A. n., Montgomery, S. B. 2017; 550 (7675): 239–43

Abstract

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.

View details for PubMedID 29022581
Genetic effects on gene expression across human tissues. Nature Battle, A. n., Brown, C. D., Engelhardt, B. E., Montgomery, S. B. 2017; 550 (7675): 204–13

Abstract

Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

View details for PubMedID 29022597
A TNFRSF14-Fc epsilon RI-mast cell pathway contributes to development of multiple features of asthma pathology in mice NATURE COMMUNICATIONS Sibilano, R., Gaudenzio, N., DeGorter, M. K., Reber, L. L., Hernandez, J. D., Starkl, P. M., Zurek, O. W., Tsai, M., Zahner, S., Montgomery, S. B., Roers, A., Kronenberg, M., Yu, M., Galli, S. J. 2016; 7

Abstract

Asthma has multiple features, including airway hyperreactivity, inflammation and remodelling. The TNF superfamily member TNFSF14 (LIGHT), via interactions with the receptor TNFRSF14 (HVEM), can support TH2 cell generation and longevity and promote airway remodelling in mouse models of asthma, but the mechanisms by which TNFSF14 functions in this setting are incompletely understood. Here we find that mouse and human mast cells (MCs) express TNFRSF14 and that TNFSF14:TNFRSF14 interactions can enhance IgE-mediated MC signalling and mediator production. In mouse models of asthma, TNFRSF14 blockade with a neutralizing antibody administered after antigen sensitization, or genetic deletion of Tnfrsf14, diminishes plasma levels of antigen-specific IgG1 and IgE antibodies, airway hyperreactivity, airway inflammation and airway remodelling. Finally, by analysing two types of genetically MC-deficient mice after engrafting MCs that either do or do not express TNFRSF14, we show that TNFRSF14 expression on MCs significantly contributes to the development of multiple features of asthma pathology.

View details for DOI 10.1038/ncomms13696

View details for Web of Science ID 000389853400001

View details for PubMedID 27982078

View details for PubMedCentralID PMC5171877
Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells. Nature methods Hess, G. T., Frésard, L., Han, K., Lee, C. H., Li, A., Cimprich, K. A., Montgomery, S. B., Bassik, M. C. 2016

Abstract

Engineering and study of protein function by directed evolution has been limited by the technical requirement to use global mutagenesis or introduce DNA libraries. Here, we develop CRISPR-X, a strategy to repurpose the somatic hypermutation machinery for protein engineering in situ. Using catalytically inactive dCas9 to recruit variants of cytidine deaminase (AID) with MS2-modified sgRNAs, we can specifically mutagenize endogenous targets with limited off-target damage. This generates diverse libraries of localized point mutations and can target multiple genomic locations simultaneously. We mutagenize GFP and select for spectrum-shifted variants, including EGFP. Additionally, we mutate the target of the cancer therapeutic bortezomib, PSMB5, and identify known and novel mutations that confer bortezomib resistance. Finally, using a hyperactive AID variant, we mutagenize loci both upstream and downstream of transcriptional start sites. These experiments illustrate a powerful approach to create complex libraries of genetic variants in native context, which is broadly applicable to investigate and improve protein function.

View details for DOI 10.1038/nmeth.4038

View details for PubMedID 27798611
Small RNA Sequencing in Cells and Exosomes Identifies eQTLs and 14q32 as a Region of Active Export. G3 (Bethesda, Md.) Tsang, E. K., Abell, N. S., Li, X., Anaya, V., Karczewski, K. J., Knowles, D. A., Sierra, R. G., Smith, K. S., Montgomery, S. B. 2016

Abstract

Exosomes are small extracellular vesicles that carry heterogeneous cargo, including RNA, between cells. Increasing evidence suggests that exosomes are important mediators of intercellular communication and biomarkers of disease. Despite this, the variability of exosomal RNA between individuals has not been well quantified. To assess this variability, we sequenced the small RNA of cells and exosomes from a 17-member family. Across individuals, we show that selective export of miRNAs occurs not only at the level of specific transcripts, but that a cluster of 74 mature miRNAs on chromosome 14q32 is massively exported in exosomes while mostly absent from cells. We also observe more interindividual variability between exosomal samples than between cellular ones and identify four miRNA expression quantitative trait loci shared between cells and exosomes. Our findings indicate that genomically colocated miRNAs can be exported together and highlight the variability in exosomal miRNA levels between individuals as relevant for exosome use as diagnostics.

View details for DOI 10.1534/g3.116.036137

View details for PubMedID 27799337

View details for PubMedCentralID PMC5217120
DNA Methylation Profiling of Uniparental Disomy Subjects Provides a Map of Parental Epigenetic Bias in the Human Genome. American journal of human genetics Joshi, R. S., Garg, P., Zaitlen, N., Lappalainen, T., Watson, C. T., Azam, N., Ho, D., Li, X., Antonarakis, S. E., Brunner, H. G., Buiting, K., Cheung, S. W., Coffee, B., Eggermann, T., Francis, D., Geraedts, J. P., Gimelli, G., Jacobson, S. G., Le Caignec, C., de Leeuw, N., Liehr, T., Mackay, D. J., Montgomery, S. B., Pagnamenta, A. T., Papenhausen, P., Robinson, D. O., Ruivenkamp, C., Schwartz, C., Steiner, B., Stevenson, D. A., Surti, U., Wassink, T., Sharp, A. J. 2016; 99 (3): 555-566

Abstract

Genomic imprinting is a mechanism in which gene expression varies depending on parental origin. Imprinting occurs through differential epigenetic marks on the two parental alleles, with most imprinted loci marked by the presence of differentially methylated regions (DMRs). To identify sites of parental epigenetic bias, here we have profiled DNA methylation patterns in a cohort of 57 individuals with uniparental disomy (UPD) for 19 different chromosomes, defining imprinted DMRs as sites where the maternal and paternal methylation levels diverge significantly from the biparental mean. Using this approach we identified 77 DMRs, including nearly all those described in previous studies, in addition to 34 DMRs not previously reported. These include a DMR at TUBGCP5 within the recurrent 15q11.2 microdeletion region, suggesting potential parent-of-origin effects associated with this genomic disorder. We also observed a modest parental bias in DNA methylation levels at every CpG analyzed across ∼1.9 Mb of the 15q11-q13 Prader-Willi/Angelman syndrome region, demonstrating that the influence of imprinting is not limited to individual regulatory elements such as CpG islands, but can extend across entire chromosomal domains. Using RNA-seq data, we detected signatures consistent with imprinted expression associated with nine novel DMRs. Finally, using a population sample of 4,004 blood methylomes, we define patterns of epigenetic variation at DMRs, identifying rare individuals with global gain or loss of methylation across multiple imprinted loci. Our data provide a detailed map of parental epigenetic bias in the human genome, providing insights into potential parent-of-origin effects.

View details for DOI 10.1016/j.ajhg.2016.06.032

View details for PubMedID 27569549
Impact of the X Chromosome and sex on regulatory variation GENOME RESEARCH Kukurba, K. R., Parsana, P., Balliu, B., Smith, K. S., Zappala, Z., Knowles, D. A., Fave, M., Davis, J. R., Li, X., Zhu, X., Potash, J. B., Weissman, M. M., Shi, J., Kundaje, A., Levinson, D. F., Awadalla, P., Mostafavi, S., Battle, A., Montgomery, S. B. 2016; 26 (6): 768-777

Abstract

The X Chromosome, with its unique mode of inheritance, contributes to differences between the sexes at a molecular level, including sex-specific gene expression and sex-specific impact of genetic variation. Improving our understanding of these differences offers to elucidate the molecular mechanisms underlying sex-specific traits and diseases. However, to date, most studies have either ignored the X Chromosome or had insufficient power to test for the sex-specific impact of genetic variation. By analyzing whole blood transcriptomes of 922 individuals, we have conducted the first large-scale, genome-wide analysis of the impact of both sex and genetic variation on patterns of gene expression, including comparison between the X Chromosome and autosomes. We identified a depletion of expression quantitative trait loci (eQTL) on the X Chromosome, especially among genes under high selective constraint. In contrast, we discovered an enrichment of sex-specific regulatory variants on the X Chromosome. To resolve the molecular mechanisms underlying such effects, we generated chromatin accessibility data through ATAC-sequencing to connect sex-specific chromatin accessibility to sex-specific patterns of expression and regulatory variation. As sex-specific regulatory variants discovered in our study can inform sex differences in heritable disease prevalence, we integrated our data with genome-wide association study data for multiple immune traits identifying several traits with significant sex biases in genetic susceptibilities. Together, our study provides genome-wide insight into how genetic variation, the X Chromosome, and sex shape human gene regulation and disease.

View details for DOI 10.1101/gr.197897.115

View details for PubMedID 27197214
An Efficient Multiple-Testing Adjustment for eQTL Studies that Accounts for Linkage Disequilibrium between Variants AMERICAN JOURNAL OF HUMAN GENETICS Davis, J. R., Fresard, L., Knowles, D. A., Pala, M., Bustamante, C. D., Battle, A., Montgomery, S. B. 2016; 98 (1): 216-224

View details for DOI 10.1016/j.ajhg.2015.11.021

View details for Web of Science ID 000368050800016
An Efficient Multiple-Testing Adjustment for eQTL Studies that Accounts for Linkage Disequilibrium between Variants. American journal of human genetics Davis, J. R., Fresard, L., Knowles, D. A., Pala, M., Bustamante, C. D., Battle, A., Montgomery, S. B. 2016; 98 (1): 216-24

Abstract

Methods for multiple-testing correction in local expression quantitative trait locus (cis-eQTL) studies are a trade-off between statistical power and computational efficiency. Bonferroni correction, though computationally trivial, is overly conservative and fails to account for linkage disequilibrium between variants. Permutation-based methods are more powerful, though computationally far more intensive. We present an alternative correction method called eigenMT, which runs over 500 times faster than permutations and has adjusted p values that closely approximate empirical ones. To achieve this speed while also maintaining the accuracy of permutation-based methods, we estimate the effective number of independent variants tested for association with a particular gene, termed Meff, by using the eigenvalue decomposition of the genotype correlation matrix. We employ a regularized estimator of the correlation matrix to ensure Meff is robust and yields adjusted p values that closely approximate p values from permutations. Finally, using a common genotype matrix, we show that eigenMT can be applied with even greater efficiency to studies across tissues or conditions. Our method provides a simpler, more efficient approach to multiple-testing correction than existing methods and fits within existing pipelines for eQTL discovery.

View details for DOI 10.1016/j.ajhg.2015.11.021

View details for PubMedID 26749306

View details for PubMedCentralID PMC4716687
ORegAnno 3.0: a community-driven resource for curated regulatory annotation. Nucleic acids research Lesurf, R., Cotto, K. C., Wang, G., Griffith, M., Kasaian, K., Jones, S. J., Montgomery, S. B., Griffith, O. L. 2016; 44 (D1): D126-32

Abstract

The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/.

View details for DOI 10.1093/nar/gkv1203

View details for PubMedID 26578589

View details for PubMedCentralID PMC4702855
Integrative functional genomics identifies regulatory mechanisms at coronary artery disease loci. Nature communications Miller, C. L., Pjanic, M., Wang, T., Nguyen, T., Cohain, A., Lee, J. D., Perisic, L., Hedin, U., Kundu, R. K., Majmudar, D., Kim, J. B., Wang, O., Betsholtz, C., Ruusalepp, A., Franzén, O., Assimes, T. L., Montgomery, S. B., Schadt, E. E., Björkegren, J. L., Quertermous, T. 2016; 7: 12092-?

Abstract

Coronary artery disease (CAD) is the leading cause of mortality and morbidity, driven by both genetic and environmental risk factors. Meta-analyses of genome-wide association studies have identified >150 loci associated with CAD and myocardial infarction susceptibility in humans. A majority of these variants reside in non-coding regions and are co-inherited with hundreds of candidate regulatory variants, presenting a challenge to elucidate their functions. Herein, we use integrative genomic, epigenomic and transcriptomic profiling of perturbed human coronary artery smooth muscle cells and tissues to begin to identify causal regulatory variation and mechanisms responsible for CAD associations. Using these genome-wide maps, we prioritize 64 candidate variants and perform allele-specific binding and expression analyses at seven top candidate loci: 9p21.3, SMAD3, PDGFD, IL6R, BMP1, CCDC97/TGFB1 and LMOD1. We validate our findings in expression quantitative trait loci cohorts, which together reveal new links between CAD associations and regulatory function in the appropriate disease context.

View details for DOI 10.1038/ncomms12092

View details for PubMedID 27386823
Non-Coding Loss-of-Function Variation in Human Genomes HUMAN HEREDITY Zappala, Z., Montgomery, S. B. 2016; 81 (2): 78-87

Abstract

Whole-genome and exome sequencing in human populations has revealed the tolerance of each gene for loss-of-function variation. By understanding this tolerance, it has become increasingly possible to identify genes that would make safe therapeutic targets and to identify rare genetic risk factors and phenotypes at the scale of individual genomes. To date, the vast majority of surveyed loss-of-function variants are in protein-coding regions of the genome mainly due to the focus on these regions by exome-based sequencing projects and their relative ease of interpretability. As whole-genome sequencing becomes more prevalent, new strategies will be required to uncover impactful variation in non-coding regions of the genome where the architecture of genome function is more complex. In this review, we investigate recent studies of loss-of-function variation and emerging approaches for interpreting whole-genome sequencing data to identify rare and impactful non-coding loss-of-function variants.

View details for DOI 10.1159/000447453

View details for Web of Science ID 000392559600029

View details for PubMedID 28076858
A global reference for human genetic variation NATURE Altshuler, D. M., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., Donnelly, P., Eichler, E. E., Flicek, P., Gabriel, S. B., Gibbs, R. A., Green, E. D., Hurles, M. E., Knoppers, B. M., Korbel, J. O., Lander, E. S., Lee, C., Lehrach, H., Mardis, E. R., Marth, G. T., McVean, G. A., Nickerson, D. A., Schmidt, J. P., Sherry, S. T., Wang, J., Wilson, R. K., Gibbs, R. A., Boerwinkle, E., Doddapaneni, H., Han, Y., Korchina, V., Kovar, C., Lee, S., Muzny, D., Reid, J. G., Zhu, Y., Wang, J., Chang, Y., Feng, Q., Fang, X., Guo, X., Jian, M., Jiang, H., Jin, X., Lan, T., Li, G., Li, J., Li, Y., Liu, S., Liu, X., Lu, Y., Ma, X., Tang, M., Wang, B., Wang, G., Wu, H., Wu, R., Xu, X., Yin, Y., Zhang, D., Zhang, W., Zhao, J., Zhao, M., Zheng, X., Lander, E. S., Altshuler, D. M., Gabriel, S. B., Gupta, N., Gharani, N., Toji, L. H., Gerry, N. P., Resch, A. M., Flicek, P., Barker, J., Clarke, L., Gil, L., Hunt, S. E., Kelman, G., Kulesha, E., Leinonen, R., McLaren, W. M., Radhakrishnan, R., Roa, A., Smirnov, D., Smith, R. E., Streeter, I., Thormann, A., Toneva, I., Vaughan, B., Zheng-Bradley, X., Bentley, D. R., Grocock, R., Humphray, S., James, T., Kingsbury, Z., Lehrach, H., Sudbrak, R., Albrecht, M. W., Amstislavskiy, V. S., Borodina, T. A., Lienhard, M., Mertes, F., Sultan, M., Timmermann, B., Yaspo, M., Mardis, E. R., Wilson, R. K., Fulton, L., Fulton, R., Sherry, S. T., Ananiev, V., Belaia, Z., Beloslyudtsev, D., Bouk, N., Chen, C., Church, D., Cohen, R., Cook, C., Garner, J., Hefferon, T., Kimelman, M., Liu, C., Lopez, J., Meric, P., O'Sullivan, C., Ostapchuk, Y., Phan, L., Ponomarov, S., Schneider, V., Shekhtman, E., Sirotkin, K., Slotta, D., Zhang, H., McVean, G. A., Durbin, R. M., Balasubramaniam, S., Burton, J., Danecek, P., Keane, T. M., Kolb-Kokocinski, A., McCarthy, S., Stalker, J., Quail, M., Schmidt, J. P., Davies, C. J., Gollub, J., Webster, T., Wong, B., Zhan, Y., Auton, A., Campbell, C. L., Kong, Y., Marcketta, A., Gibbs, R. A., Yu, F., Antunes, L., Bainbridge, M., Muzny, D., Sabo, A., Huang, Z., Wang, J., Coin, L. J., Fang, L., Guo, X., Jin, X., Li, G., Li, Q., Li, Y., Li, Z., Lin, H., Liu, B., Luo, R., Shao, H., Xie, Y., Ye, C., Yu, C., Zhang, F., Zheng, H., Zhu, H., Alkan, C., Dal, E., Kahveci, F., Marth, G. T., Garrison, E. P., Kural, D., Lee, W., Leong, W. F., Stromberg, M., Ward, A. N., Wu, J., Zhang, M., Daly, M. J., DePristo, M. A., Handsaker, R. E., Altshuler, D. M., Banks, E., Bhatia, G., del Angel, G., Gabriel, S. B., Genovese, G., Gupta, N., Li, H., Kashin, S., Lander, E. S., McCarroll, S. A., Nemesh, J. C., Poplin, R. E., Yoon, S. C., Lihm, J., Makarov, V., Clark, A. G., Gottipati, S., Keinan, A., Rodriguez-Flores, J. L., Korbel, J. O., Rausch, T., Fritz, M. H., Stuetz, A. M., Flicek, P., Beal, K., Clarke, L., Datta, A., Herrero, J., McLaren, W. M., Ritchie, G. R., Smith, R. E., Zerbino, D., Zheng-Bradley, X., Sabeti, P. C., Shlyakhter, I., Schaffner, S. F., Vitti, J., Cooper, D. N., Ball, E. V., Stenson, P. D., Bentley, D. R., Barnes, B., Bauer, M., Cheetham, R. K., Cox, A., Eberle, M., Humphray, S., Kahn, S., Murray, L., Peden, J., Shaw, R., Kenny, E. E., Batzer, M. A., Konkel, M. K., Walker, J. A., MacArthur, D. G., Lek, M., Sudbrak, R., Amstislavskiy, V. S., Herwig, R., Mardis, E. R., Ding, L., Koboldt, D. C., Larson, D., Ye, K., Gravel, S., Swaroop, A., Chew, E., Lappalainen, T., Erlich, Y., Gymrek, M., Willems, T. F., Simpson, J. T., Shriver, M. D., Rosenfeld, J. A., Bustamante, C. D., Montgomery, S. B., De La Vega, F. M., Byrnes, J. K., Carroll, A. W., DeGorter, M. K., Lacroute, P., Maples, B. K., Martin, A. R., Moreno-Estrada, A., Shringarpure, S. S., Zakharia, F., Halperin, E., Baran, Y., Lee, C., Cerveira, E., Hwang, J., Malhotra, A., Plewczynski, D., Radew, K., Romanovitch, M., Zhang, C., Hyland, F. C., Craig, D. W., Christoforides, A., Homer, N., Izatt, T., Kurdoglu, A. A., Sinari, S. A., Squire, K., Sherry, S. T., Xiao, C., Sebat, J., Antaki, D., Gujral, M., Noor, A., Ye, K., Burchard, E. G., Hernandez, R. D., Gignoux, C. R., Haussler, D., Katzman, S. J., Kent, W. J., Howie, B., Ruiz-Linares, A., Dermitzakis, E. T., Devine, S. E., Goncalo, R. A., Kang, H. M., Kidd, J. M., Blackwell, T., Caron, S., Chen, W., Emery, S., Fritsche, L., Fuchsberger, C., Jun, G., Li, B., Lyons, R., Scheller, C., Sidore, C., Song, S., Sliwerska, E., Taliun, D., Tan, A., Welch, R., Wing, M. K., Zhan, X., Awadalla, P., Hodgkinson, A., Li, Y., Shi, X., Quitadamo, A., Lunter, G., McVean, G. A., Marchini, J. L., Myers, S., Churchhouse, C., Delaneau, O., Gupta-Hinch, A., Kretzschmar, W., Iqbal, Z., Mathieson, I., Menelaou, A., Rimmer, A., Xifara, D. K., Oleksyk, T. K., Fu, Y., Liu, X., Xiong, M., Jorde, L., Witherspoon, D., Xing, J., Eichler, E. E., Browning, B. L., Browning, S. R., Hormozdiari, F., Sudmant, P. H., Khurana, E., Durbin, R. M., Hurles, M. E., Tyler-Smith, C., Albers, C. A., Ayub, Q., Balasubramaniam, S., Chen, Y., Colonna, V., Danecek, P., Jostins, L., Keane, T. M., McCarthy, S., Walter, K., Xue, Y., Gerstein, M. B., Abyzov, A., Balasubramanian, S., Chen, J., Clarke, D., Fu, Y., Harmanci, A. O., Jin, M., Lee, D., Liu, J., Mu, X. J., Zhang, J., Zhang, Y., Li, Y., Luo, R., Zhu, H., Alkan, C., Dal, E., Kahveci, F., Marth, G. T., Garrison, E. P., Kural, D., Lee, W., Ward, A. N., Wu, J., Zhang, M., McCarroll, S. A., Handsaker, R. E., Altshuler, D. M., Banks, E., del Angel, G., Genovese, G., Hartl, C., Li, H., Kashin, S., Nemesh, J. C., Shakir, K., Yoon, S. C., Lihm, J., Makarov, V., Degenhardt, J., Korbel, J. O., Fritz, M. H., Meiers, S., Raeder, B., Rausch, T., Stuetz, A. M., Flicek, P., Casale, F. P., Clarke, L., Smith, R. E., Stegle, O., Zheng-Bradley, X., Bentley, D. R., Barnes, B., Cheetham, R. K., Eberle, M., Humphray, S., Kahn, S., Murray, L., Shaw, R., Lameijer, E., Batzer, M. A., Konkel, M. K., Walker, J. A., Ding, L., Hall, I., Ye, K., Lacroute, P., Lee, C., Cerveira, E., Malhotra, A., Hwang, J., Plewczynski, D., Radew, K., Romanovitch, M., Zhang, C., Craig, D. W., Homer, N., Church, D., Xiao, C., Sebat, J., Antaki, D., Bafna, V., Michaelson, J., Ye, K., Devine, S. E., Gardner, E. J., Abecasis, G. R., Kidd, J. M., Mills, R. E., Dayama, G., Emery, S., Jun, G., Shi, X., Quitadamo, A., Lunter, G., McVean, G. A., Chen, K., Fan, X., Chong, Z., Chen, T., Witherspoon, D., Xing, J., Eichler, E. E., Chaisson, M. J., Hormozdiari, F., Huddleston, J., Malig, M., Nelson, B. J., Sudmant, P. H., Parrish, N. F., Khurana, E., Hurles, M. E., Blackburne, B., Lindsay, S. J., Ning, Z., Walter, K., Zhang, Y., Gerstein, M. B., Abyzov, A., Chen, J., Clarke, D., Lam, H., Mu, X. J., Sisu, C., Zhang, J., Zhang, Y., Gibbs, R. A., Yu, F., Bainbridge, M., Challis, D., Evani, U. S., Kovar, C., Lu, J., Muzny, D., Nagaswamy, U., Reid, J. G., Sabo, A., Yu, J., Guo, X., Li, W., Li, Y., Wu, R., Marth, G. T., Garrison, E. P., Leong, W. F., Ward, A. N., del Angel, G., DePristo, M. A., Gabriel, S. B., Gupta, N., Hartl, C., Poplin, R. E., Clark, A. G., Rodriguez-Flores, J. L., Flicek, P., Clarke, L., Smith, R. E., Zheng-Bradley, X., MacArthur, D. G., Mardis, E. R., Fulton, R., Koboldt, D. C., Gravel, S., Bustamante, C. D., Craig, D. W., Christoforides, A., Homer, N., Izatt, T., Sherry, S. T., Xiao, C., Dermitzakis, E. T., Abecasis, G. R., Kang, H. M., McVean, G. A., Gerstein, M. B., Balasubramanian, S., Habegger, L., Yu, H., Flicek, P., Clarke, L., Cunningham, F., Dunham, I., Zerbino, D., Zheng-Bradley, X., Lage, K., Jespersen, J. B., Horn, H., Montgomery, S. B., DeGorter, M. K., Khurana, E., Tyler-Smith, C., Chen, Y., Colonna, V., Xue, Y., Gerstein, M. B., Balasubramanian, S., Fu, Y., Kim, D., Auton, A., Marcketta, A., DeSalle, R., Narechania, A., Sayres, M. A., Garrison, E. P., Handsaker, R. E., Kashin, S., McCarroll, S. A., Rodriguez-Flores, J. L., Flicek, P., Clarke, L., Zheng-Bradley, X., Erlich, Y., Gymrek, M., Willems, T. F., Bustamante, C. D., Mendez, F. L., Poznik, G. D., Underhill, P. A., Lee, C., Cerveira, E., Malhotra, A., Romanovitch, M., Zhang, C., Abecasis, G. R., Coin, L., Shao, H., Mittelman, D., Tyler-Smith, C., Ayub, Q., Banerjee, R., Cerezo, M., Chen, Y., Fitzgerald, T., Louzada, S., Massaia, A., McCarthy, S., Ritchie, G. R., Xue, Y., Yang, F., Gibbs, R. A., Kovar, C., Kalra, D., Hale, W., Muzny, D., Reid, J. G., Wang, J., Dan, X., Guo, X., Li, G., Li, Y., Ye, C., Zheng, X., Altshuler, D. M., Flicek, P., Clarke, L., Zheng-Bradley, X., Bentley, D. R., Cox, A., Humphray, S., Kahn, S., Sudbrak, R., Albrecht, M. W., Lienhard, M., Larson, D., Craig, D. W., Izatt, T., Kurdoglu, A. A., Sherry, S. T., Xiao, C., Haussler, D., Abecasis, G. R., McVean, G. A., Durbin, R. M., Balasubramaniam, S., Keane, T. M., McCarthy, S., Stalker, J., Chakravarti, A., Knoppers, B. M., Abecasis, G. R., Barnes, K. C., Beiswanger, C., Burchard, E. G., Bustamante, C. D., Cai, H., Cao, H., Durbin, R. M., Gerry, N. P., Gharani, N., Gibbs, R. A., Gignoux, C. R., Gravel, S., Henn, B., Jones, D., Jorde, L., Kaye, J. S., Keinan, A., Kent, A., Kerasidou, A., Li, Y., Mathias, R., McVean, G. A., Moreno-Estrada, A., Ossorio, P. N., Parker, M., Resch, A. M., Rotimi, C. N., Royal, C. D., Sandoval, K., Su, Y., Sudbrak, R., Tian, Z., Tishkoff, S., Toji, L. H., Tyler-Smith, C., Via, M., Wang, Y., Yang, H., Yang, L., Zhu, J., Bodmer, W., Bedoya, G., Ruiz-Linares, A., Cai, Z., Gao, Y., Chu, J., Peltonen, L., Garcia-Montero, A., Orfao, A., Dutil, J., Martinez-Cruzado, J. C., Oleksyk, T. K., Barnes, K. C., Mathias, R. A., Hennis, A., Watson, H., McKenzie, C., Qadri, F., LaRocque, R., Sabeti, P. C., Zhu, J., Deng, X., Sabeti, P. C., Asogun, D., Folarin, O., Happi, C., Omoniwa, O., Stremlau, M., Tariyal, R., Jallow, M., Joof, F. S., Corrah, T., Rockett, K., Kwiatkowski, D., Kooner, J., Tran Tinh Hien, T. T., Dunstan, S. J., Nguyen Thuy Hang, N. T., Fonnie, R., Garry, R., Kanneh, L., Moses, L., Sabeti, P. C., Schieffelin, J., Grant, D. S., Gallo, C., Poletti, G., Saleheen, D., Rasheed, A., Brook, L. D., Felsenfeld, A., McEwen, J. E., Vaydylevich, Y., Green, E. D., Duncanson, A., Dunn, M., Schloss, J. A., Wang, J., Yang, H., Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., Korbel, J. O., Marchini, J. L., McCarthy, S., McVean, G. A., Abecasis, G. R. 2015; 526 (7571): 68-?

Abstract

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

View details for DOI 10.1038/nature15393

View details for Web of Science ID 000362095100036
The landscape of genomic imprinting across diverse adult human tissues GENOME RESEARCH Baran, Y., Subramaniam, M., Biton, A., Tukiainen, T., Tsang, E. K., Rivas, M. A., Pirinen, M., Gutierrez-Arcelus, M., Smith, K. S., Kukurba, K. R., Zhang, R., Eng, C., Torgerson, D. G., Urbanek, C., Li, J. B., Rodriguez-Santana, J. R., Burchard, E. G., Seibold, M. A., MacArthur, D. G., Montgomery, S. B., Zaitlen, N. A., Lappalainen, T. 2015; 25 (7): 927-936

Abstract

Genomic imprinting is an important regulatory mechanism that silences one of the parental copies of a gene. To systematically characterize this phenomenon, we analyze tissue specificity of imprinting from allelic expression data in 1582 primary tissue samples from 178 individuals from the Genotype-Tissue Expression (GTEx) project. We characterize imprinting in 42 genes, including both novel and previously identified genes. Tissue specificity of imprinting is widespread, and gender-specific effects are revealed in a small number of genes in muscle with stronger imprinting in males. IGF2 shows maternal expression in the brain instead of the canonical paternal expression elsewhere. Imprinting appears to have only a subtle impact on tissue-specific expression levels, with genes lacking a systematic expression difference between tissues with imprinted and biallelic expression. In summary, our systematic characterization of imprinting in adult tissues highlights variation in imprinting between genes, individuals, and tissues.

View details for DOI 10.1101/gr.192278.115

View details for Web of Science ID 000357356900001

View details for PubMedID 25953952

View details for PubMedCentralID PMC4484390
Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science Rivas, M. A., Pirinen, M., Conrad, D. F., Lek, M., Tsang, E. K., Karczewski, K. J., Maller, J. B., Kukurba, K. R., DeLuca, D. S., Fromer, M., Ferreira, P. G., Smith, K. S., Zhang, R., Zhao, F., Banks, E., Poplin, R., Ruderfer, D. M., Purcell, S. M., Tukiainen, T., Minikel, E. V., Stenson, P. D., Cooper, D. N., Huang, K. H., Sullivan, T. J., Nedzel, J., Bustamante, C. D., Li, J. B., Daly, M. J., Guigo, R., Donnelly, P., Ardlie, K., Sammeth, M., Dermitzakis, E. T., McCarthy, M. I., Montgomery, S. B., Lappalainen, T., MacArthur, D. G. 2015; 348 (6235): 666-669

Abstract

Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.

View details for DOI 10.1126/science.1261877

View details for PubMedID 25954003
Effect of predicted protein-truncating genetic variants on the human transcriptome SCIENCE Rivas, M. A., Pirinen, M., Conrad, D. F., Lek, M., Tsang, E. K., Karczewski, K. J., Maller, J. B., Kukurba, K. R., DeLuca, D. S., Fromer, M., Ferreira, P. G., Smith, K. S., Zhang, R., Zhao, F., Banks, E., Poplin, R., Ruderfer, D. M., Purcell, S. M., Tukiainen, T., Minikel, E. V., Stenson, P. D., Cooper, D. N., Huang, K. H., Sullivan, T. J., Nedzel, J., Bustamante, C. D., Li, J. B., Daly, M. J., Guigo, R., Donnelly, P., Ardlie, K., Sammeth, M., Dermitzakis, E. T., McCarthy, M. I., Montgomery, S. B., Lappalainen, T., MacArthur, D. G. 2015; 348 (6235): 666-669

Abstract

Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.

View details for DOI 10.1126/science.1261877

View details for Web of Science ID 000354045700038

View details for PubMedCentralID PMC4537935
Genetic conflict reflected in tissue-specific maps of genomic imprinting in human and mouse. Nature genetics Babak, T., Deveale, B., Tsang, E. K., Zhou, Y., Li, X., Smith, K. S., Kukurba, K. R., Zhang, R., Li, J. B., van der Kooy, D., Montgomery, S. B., Fraser, H. B. 2015; 47 (5): 544-549

Abstract

Genomic imprinting is an epigenetic process that restricts gene expression to either the maternally or paternally inherited allele. Many theories have been proposed to explain its evolutionary origin, but understanding has been limited by a paucity of data mapping the breadth and dynamics of imprinting within any organism. We generated an atlas of imprinting spanning 33 mouse and 45 human developmental stages and tissues. Nearly all imprinted genes were imprinted in early development and either retained their parent-of-origin expression in adults or lost it completely. Consistent with an evolutionary signature of parental conflict, imprinted genes were enriched for coexpressed pairs of maternally and paternally expressed genes, showed accelerated expression divergence between human and mouse, and were more highly expressed than their non-imprinted orthologs in other species. Our approach demonstrates a general framework for the discovery of imprinting in any species and sheds light on the causes and consequences of genomic imprinting in mammals.

View details for DOI 10.1038/ng.3274

View details for PubMedID 25848752
Genetic conflict reflected in tissue-specific maps of genomic imprinting in human and mouse. Nature genetics Babak, T., Deveale, B., Tsang, E. K., Zhou, Y., Li, X., Smith, K. S., Kukurba, K. R., Zhang, R., Li, J. B., van der Kooy, D., Montgomery, S. B., Fraser, H. B. 2015; 47 (5): 544-549

Abstract

Genomic imprinting is an epigenetic process that restricts gene expression to either the maternally or paternally inherited allele. Many theories have been proposed to explain its evolutionary origin, but understanding has been limited by a paucity of data mapping the breadth and dynamics of imprinting within any organism. We generated an atlas of imprinting spanning 33 mouse and 45 human developmental stages and tissues. Nearly all imprinted genes were imprinted in early development and either retained their parent-of-origin expression in adults or lost it completely. Consistent with an evolutionary signature of parental conflict, imprinted genes were enriched for coexpressed pairs of maternally and paternally expressed genes, showed accelerated expression divergence between human and mouse, and were more highly expressed than their non-imprinted orthologs in other species. Our approach demonstrates a general framework for the discovery of imprinting in any species and sheds light on the causes and consequences of genomic imprinting in mammals.

View details for DOI 10.1038/ng.3274

View details for PubMedID 25848752

View details for PubMedCentralID PMC4414907
RNA Sequencing and Analysis. Cold Spring Harbor protocols Kukurba, K. R., Montgomery, S. B. 2015; 2015 (11): 951-69

Abstract

RNA sequencing (RNA-Seq) uses the capabilities of high-throughput sequencing methods to provide insight into the transcriptome of a cell. Compared to previous Sanger sequencing- and microarray-based methods, RNA-Seq provides far higher coverage and greater resolution of the dynamic nature of the transcriptome. Beyond quantifying gene expression, the data generated by RNA-Seq facilitate the discovery of novel transcripts, identification of alternatively spliced genes, and detection of allele-specific expression. Recent advances in the RNA-Seq workflow, from sample preparation to library construction to data analysis, have enabled researchers to further elucidate the functional complexity of the transcription. In addition to polyadenylated messenger RNA (mRNA) transcripts, RNA-Seq can be applied to investigate different populations of RNA, including total RNA, pre-mRNA, and noncoding RNA, such as microRNA and long ncRNA. This article provides an introduction to RNA-Seq methods, including applications, experimental design, and technical challenges.

View details for DOI 10.1101/pdb.top084970

View details for PubMedID 25870306
Tissue-specific effects of genetic and epigenetic variation on gene regulation and splicing. PLoS genetics Gutierrez-Arcelus, M., Ongen, H., Lappalainen, T., Montgomery, S. B., Buil, A., Yurovsky, A., Bryois, J., Padioleau, I., Romano, L., Planchon, A., Falconnet, E., Bielser, D., Gagnebin, M., Giger, T., Borel, C., Letourneau, A., Makrythanasis, P., Guipponi, M., Gehrig, C., Antonarakis, S. E., Dermitzakis, E. T. 2015; 11 (1)

Abstract

Understanding how genetic variation affects distinct cellular phenotypes, such as gene expression levels, alternative splicing and DNA methylation levels, is essential for better understanding of complex diseases and traits. Furthermore, how inter-individual variation of DNA methylation is associated to gene expression is just starting to be studied. In this study, we use the GenCord cohort of 204 newborn Europeans' lymphoblastoid cell lines, T-cells and fibroblasts derived from umbilical cords. The samples were previously genotyped for 2.5 million SNPs, mRNA-sequenced, and assayed for methylation levels in 482,421 CpG sites. We observe that methylation sites associated to expression levels are enriched in enhancers, gene bodies and CpG island shores. We show that while the correlation between DNA methylation and gene expression can be positive or negative, it is very consistent across cell-types. However, this epigenetic association to gene expression appears more tissue-specific than the genetic effects on gene expression or DNA methylation (observed in both sharing estimations based on P-values and effect size correlations between cell-types). This predominance of genetic effects can also be reflected by the observation that allele specific expression differences between individuals dominate over tissue-specific effects. Additionally, we discover genetic effects on alternative splicing and interestingly, a large amount of DNA methylation correlating to alternative splicing, both in a tissue-specific manner. The locations of the SNPs and methylation sites involved in these associations highlight the participation of promoter proximal and distant regulatory regions on alternative splicing. Overall, our results provide high-resolution analyses showing how genome sequence variation has a broad effect on cellular phenotypes across cell-types, whereas epigenetic factors provide a secondary layer of variation that is more tissue-specific. Furthermore, the details of how this tissue-specificity may vary across inter-relations of molecular traits, and where these are occurring, can yield further insights into gene regulation and cellular biology as a whole.

View details for DOI 10.1371/journal.pgen.1004958

View details for PubMedID 25634236

View details for PubMedCentralID PMC4310612
RNA Sequencing and Analysis. Cold Spring Harbor protocols Kukurba, K. R., Montgomery, S. B. 2015; 2015 (11): pdb top084970-?

Abstract

RNA sequencing (RNA-Seq) uses the capabilities of high-throughput sequencing methods to provide insight into the transcriptome of a cell. Compared to previous Sanger sequencing- and microarray-based methods, RNA-Seq provides far higher coverage and greater resolution of the dynamic nature of the transcriptome. Beyond quantifying gene expression, the data generated by RNA-Seq facilitate the discovery of novel transcripts, identification of alternatively spliced genes, and detection of allele-specific expression. Recent advances in the RNA-Seq workflow, from sample preparation to library construction to data analysis, have enabled researchers to further elucidate the functional complexity of the transcription. In addition to polyadenylated messenger RNA (mRNA) transcripts, RNA-Seq can be applied to investigate different populations of RNA, including total RNA, pre-mRNA, and noncoding RNA, such as microRNA and long ncRNA. This article provides an introduction to RNA-Seq methods, including applications, experimental design, and technical challenges.

View details for DOI 10.1101/pdb.top084970

View details for PubMedID 25870306
Type I interferon signaling genes in recurrent major depression: increased expression detected by whole-blood RNA sequencing. Molecular psychiatry Mostafavi, S., Battle, A., Zhu, X., Potash, J. B., Weissman, M. M., Shi, J., Beckman, K., Haudenschild, C., McCormick, C., Mei, R., Gameroff, M. J., Gindes, H., Adams, P., Goes, F. S., Mondimore, F. M., MacKinnon, D. F., Notes, L., Schweizer, B., Furman, D., Montgomery, S. B., Urban, A. E., Koller, D., Levinson, D. F. 2014; 19 (12): 1267-1274

Abstract

A study of genome-wide gene expression in major depressive disorder (MDD) was undertaken in a large population-based sample to determine whether altered expression levels of genes and pathways could provide insights into biological mechanisms that are relevant to this disorder. Gene expression studies have the potential to detect changes that may be because of differences in common or rare genomic sequence variation, environmental factors or their interaction. We recruited a European ancestry sample of 463 individuals with recurrent MDD and 459 controls, obtained self-report and semi-structured interview data about psychiatric and medical history and other environmental variables, sequenced RNA from whole blood and genotyped a genome-wide panel of common single-nucleotide polymorphisms. We used analytical methods to identify MDD-related genes and pathways using all of these sources of information. In analyses of association between MDD and expression levels of 13 857 single autosomal genes, accounting for multiple technical, physiological and environmental covariates, a significant excess of low P-values was observed, but there was no significant single-gene association after genome-wide correction. Pathway-based analyses of expression data detected significant association of MDD with increased expression of genes in the interferon α/β signaling pathway. This finding could not be explained by potentially confounding diseases and medications (including antidepressants) or by computationally estimated proportions of white blood cell types. Although cause-effect relationships cannot be determined from these data, the results support the hypothesis that altered immune signaling has a role in the pathogenesis, manifestation, and/or the persistence and progression of MDD.Molecular Psychiatry advance online publication, 3 December 2013; doi:10.1038/mp.2013.161.

View details for DOI 10.1038/mp.2013.161

View details for PubMedID 24296977
Type I interferon signaling genes in recurrent major depression: increased expression detected by whole-blood RNA sequencing MOLECULAR PSYCHIATRY Mostafavi, S., Battle, A., Zhu, X., Potash, J. B., Weissman, M. M., Shi, J., Beckman, K., Haudenschild, C., McCormick, C., Mei, R., Gameroff, M. J., Gindes, H., Adams, P., Goes, F. S., Mondimore, F. M., MacKinnon, D. F., Notes, L., Schweizer, B., Furman, D., Montgomery, S. B., Urban, A. E., Koller, D., Levinson, D. F. 2014; 19 (12): 1267-1274

View details for DOI 10.1038/mp.2013.161

View details for Web of Science ID 000345423500004
High-Resolution Transcriptome Analysis with Long-Read RNA Sequencing PLOS ONE Cho, H., Davis, J., Li, X., Smith, K. S., Battle, A., Montgomery, S. B. 2014; 9 (9)

Abstract

RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals.

View details for DOI 10.1371/journal.pone.0108095

View details for Web of Science ID 000342492700076

View details for PubMedCentralID PMC4176000
Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. American journal of human genetics Li, X., Battle, A., Karczewski, K. J., Zappala, Z., Knowles, D. A., Smith, K. S., Kukurba, K. R., Wu, E., Simon, N., Montgomery, S. B. 2014; 95 (3): 245-256

Abstract

Recent and rapid human population growth has led to an excess of rare genetic variants that are expected to contribute to an individual's genetic burden of disease risk. To date, much of the focus has been on rare protein-coding variants, for which potential impact can be estimated from the genetic code, but determining the impact of rare noncoding variants has been more challenging. To improve our understanding of such variants, we combined high-quality genome sequencing and RNA sequencing data from a 17-individual, three-generation family to contrast expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) within this family to eQTLs and sQTLs within a population sample. Using this design, we found that eQTLs and sQTLs with large effects in the family were enriched with rare regulatory and splicing variants (minor allele frequency < 0.01). They were also more likely to influence essential genes and genes involved in complex disease. In addition, we tested the capacity of diverse noncoding annotation to predict the impact of rare noncoding variants. We found that distance to the transcription start site, evolutionary constraint, and epigenetic annotation were considerably more informative for predicting the impact of rare variants than for predicting the impact of common variants. These results highlight that rare noncoding variants are important contributors to individual gene-expression profiles and further demonstrate a significant capability for genomic annotation to predict the impact of rare noncoding variants.

View details for DOI 10.1016/j.ajhg.2014.08.004

View details for PubMedID 25192044

View details for PubMedCentralID PMC4157143
Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS genetics Martin, A. R., Costa, H. A., Lappalainen, T., Henn, B. M., Kidd, J. M., Yee, M., Grubert, F., Cann, H. M., Snyder, M., Montgomery, S. B., Bustamante, C. D. 2014; 10 (8)

Abstract

Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP). The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and regulatory genetics across populations from the broadest points of human migration history yet sampled.

View details for DOI 10.1371/journal.pgen.1004549

View details for PubMedID 25121757
Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS genetics Martin, A. R., Costa, H. A., Lappalainen, T., Henn, B. M., Kidd, J. M., Yee, M., Grubert, F., Cann, H. M., Snyder, M., Montgomery, S. B., Bustamante, C. D. 2014; 10 (8)

View details for DOI 10.1371/journal.pgen.1004549

View details for PubMedID 25121757
Cis and trans effects of human genomic variants on gene expression. PLoS genetics Bryois, J., Buil, A., Evans, D. M., Kemp, J. P., Montgomery, S. B., Conrad, D. F., Ho, K. M., Ring, S., Hurles, M., Deloukas, P., Davey Smith, G., Dermitzakis, E. T. 2014; 10 (7)

Abstract

Gene expression is a heritable cellular phenotype that defines the function of a cell and can lead to diseases in case of misregulation. In order to detect genetic variations affecting gene expression, we performed association analysis of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) with gene expression measured in 869 lymphoblastoid cell lines of the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort in cis and in trans. We discovered that 3,534 genes (false discovery rate (FDR) = 5%) are affected by an expression quantitative trait locus (eQTL) in cis and 48 genes are affected in trans. We observed that CNVs are more likely to be eQTLs than SNPs. In addition, we found that variants associated to complex traits and diseases are enriched for trans-eQTLs and that trans-eQTLs are enriched for cis-eQTLs. As a variant affecting both a gene in cis and in trans suggests that the cis gene is functionally linked to the trans gene expression, we looked specifically for trans effects of cis-eQTLs. We discovered that 26 cis-eQTLs are associated to 92 genes in trans with the cis-eQTLs of the transcriptions factors BATF3 and HMX2 affecting the most genes. We then explored if the variation of the level of expression of the cis genes were causally affecting the level of expression of the trans genes and discovered several causal relationships between variation in the level of expression of the cis gene and variation of the level of expression of the trans gene. This analysis shows that a large sample size allows the discovery of secondary effects of human variations on gene expression that can be used to construct short directed gene regulatory networks.

View details for DOI 10.1371/journal.pgen.1004461

View details for PubMedID 25010687

View details for PubMedCentralID PMC4091791
Cis and trans effects of human genomic variants on gene expression. PLoS genetics Bryois, J., Buil, A., Evans, D. M., Kemp, J. P., Montgomery, S. B., Conrad, D. F., Ho, K. M., Ring, S., Hurles, M., Deloukas, P., Davey Smith, G., Dermitzakis, E. T. 2014; 10 (7): e1004461

Abstract

Gene expression is a heritable cellular phenotype that defines the function of a cell and can lead to diseases in case of misregulation. In order to detect genetic variations affecting gene expression, we performed association analysis of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) with gene expression measured in 869 lymphoblastoid cell lines of the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort in cis and in trans. We discovered that 3,534 genes (false discovery rate (FDR) = 5%) are affected by an expression quantitative trait locus (eQTL) in cis and 48 genes are affected in trans. We observed that CNVs are more likely to be eQTLs than SNPs. In addition, we found that variants associated to complex traits and diseases are enriched for trans-eQTLs and that trans-eQTLs are enriched for cis-eQTLs. As a variant affecting both a gene in cis and in trans suggests that the cis gene is functionally linked to the trans gene expression, we looked specifically for trans effects of cis-eQTLs. We discovered that 26 cis-eQTLs are associated to 92 genes in trans with the cis-eQTLs of the transcriptions factors BATF3 and HMX2 affecting the most genes. We then explored if the variation of the level of expression of the cis genes were causally affecting the level of expression of the trans genes and discovered several causal relationships between variation in the level of expression of the cis gene and variation of the level of expression of the trans gene. This analysis shows that a large sample size allows the discovery of secondary effects of human variations on gene expression that can be used to construct short directed gene regulatory networks.

View details for DOI 10.1371/journal.pgen.1004461

View details for PubMedID 25010687

View details for PubMedCentralID PMC4091791
Determining causality and consequence of expression quantitative trait loci HUMAN GENETICS Battle, A., Montgomery, S. B. 2014; 133 (6): 727-735

Abstract

Expression quantitative trait loci (eQTLs) are currently the most abundant and systematically-surveyed class of functional consequence for genetic variation. Recent genetic studies of gene expression have identified thousands of eQTLs in diverse tissue types for the majority of human genes. Application of this large eQTL catalog provides an important resource for understanding the molecular basis of common genetic diseases. However, only now has both the availability of individuals with full genomes and corresponding advances in functional genomics provided the opportunity to dissect eQTLs to identify causal regulatory variants. Resolving the properties of such causal regulatory variants is improving understanding of the molecular mechanisms that influence traits and guiding the development of new genome-scale approaches to variant interpretation. In this review, we provide an overview of current computational and experimental methods for identifying causal regulatory variants and predicting their phenotypic consequences.

View details for DOI 10.1007/s00439-014-1446-0

View details for Web of Science ID 000336317000005

View details for PubMedID 24770875
Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues. PLoS genetics Kukurba, K. R., Zhang, R., Li, X., Smith, K. S., Knowles, D. A., How Tan, M., Piskol, R., Lek, M., Snyder, M., MacArthur, D. G., Li, J. B., Montgomery, S. B. 2014; 10 (5)

Abstract

Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants.

View details for DOI 10.1371/journal.pgen.1004304

View details for PubMedID 24786518
Dissecting the causal genetic mechanisms of coronary heart disease. Current atherosclerosis reports Miller, C. L., Assimes, T. L., Montgomery, S. B., Quertermous, T. 2014; 16 (5): 406-?

Abstract

Large-scale genome-wide association studies (GWAS) have identified 46 loci that are associated with coronary heart disease (CHD). Additionally, 104 independent candidate variants (false discovery rate of 5 %) have been identified (Schunkert H, Konig IR, Kathiresan S, Reilly MP, Assimes TL, Holm H et al. Nat Genet 43:333-8, 2011; Deloukas P, Kanoni S, Willenborg C, Farrall M, Assimes TL, Thompson JR et al. Nat Genet 45:25-33, 2012; C4D Genetics Consortium. Nat Genet 43:339-44, 2011). The majority of the causal genes in these loci function independently of conventional risk factors. It is postulated that a number of the CHD-associated genes regulate basic processes in the vascular cells involved in atherosclerosis, and that study of the signaling pathways that are modulated in this cell type by causal regulatory variation will provide critical new insights for targeting the initiation and progression of disease. In this review, we will discuss the types of experimental approaches and data that are critical to understanding the molecular processes that underlie the disease risk at 9p21.3, TCF21, SORT1, and other CHD-associated loci.

View details for DOI 10.1007/s11883-014-0406-4

View details for PubMedID 24623178
SplicePlot: a utility for visualizing splicing quantitative trait loci. Bioinformatics Wu, E., Nance, T., Montgomery, S. B. 2014; 30 (7): 1025-1026

Abstract

RNA-Sequencing has provided unprecedented resolution of alternative splicing and splicing-quantitative trait loci (sQTL). However, there are few tools available for visualizing the genotype-dependent effects of splicing at a population level. SplicePlot is a simple command line utility that produces intuitive visualization of sQTLs and their effects. SplicePlot takes mapped RNA-seq reads in BAM format and genotype data in VCF format as input and outputs publication quality sashimi plots, hive plots, and structure plots enabling better investigation and understanding of the role of genetics on alternative splicing and transcript structure.Availability and Implementation: Source code and detailed documentation are available at http://montgomerylab.stanford.edu/spliceplot/index.html under Resources and at Github. SplicePlot is implemented in Python and is supported on Linux and Mac OS. A VirtualBox virtual machine running Ubuntu with SplicePlot already installed is also available.wu.eric.g@gmail.com or smontgom@stanford.edu.

View details for DOI 10.1093/bioinformatics/btt733

View details for PubMedID 24363378
Path-scan: a reporting tool for identifying clinically actionable variants. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Daneshjou, R., Zappala, Z., Kukurba, K., Boyle, S. M., Ormond, K. E., Klein, T. E., Snyder, M., Bustamante, C. D., Altman, R. B., Montgomery, S. B. 2014; 19: 229-240

Abstract

The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.

View details for PubMedID 24297550
Transcriptome analysis reveals differential splicing events in IPF lung tissue. PloS one Nance, T., Smith, K. S., Anaya, V., Richardson, R., Ho, L., Pala, M., Mostafavi, S., Battle, A., Feghali-Bostwick, C., Rosen, G., Montgomery, S. B. 2014; 9 (5)

Abstract

Idiopathic pulmonary fibrosis (IPF) is a complex disease in which a multitude of proteins and networks are disrupted. Interrogation of the transcriptome through RNA sequencing (RNA-Seq) enables the determination of genes whose differential expression is most significant in IPF, as well as the detection of alternative splicing events which are not easily observed with traditional microarray experiments. We sequenced messenger RNA from 8 IPF lung samples and 7 healthy controls on an Illumina HiSeq 2000, and found evidence for substantial differential gene expression and differential splicing. 873 genes were differentially expressed in IPF (FDR<5%), and 440 unique genes had significant differential splicing events in at least one exonic region (FDR<5%). We used qPCR to validate the differential exon usage in the second and third most significant exonic regions, in the genes COL6A3 (RNA-Seq adjusted pval = 7.18e-10) and POSTN (RNA-Seq adjusted pval = 2.06e-09), which encode the extracellular matrix proteins collagen alpha-3(VI) and periostin. The increased gene-level expression of periostin has been associated with IPF and its clinical progression, but its differential splicing has not been studied in the context of this disease. Our results suggest that alternative splicing of these and other genes may be involved in the pathogenesis of IPF. We have developed an interactive web application which allows users to explore the results of our RNA-Seq experiment, as well as those of two previously published microarray experiments, and we hope that this will serve as a resource for future investigations of gene regulation in IPF.

View details for DOI 10.1371/journal.pone.0097550

View details for PubMedID 24805851
High-resolution transcriptome analysis with long-read RNA sequencing. PloS one Cho, H., Davis, J., Li, X., Smith, K. S., Battle, A., Montgomery, S. B. 2014; 9 (9)

Abstract

RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals.

View details for DOI 10.1371/journal.pone.0108095

View details for PubMedID 25251678
Transcriptome Analysis Reveals Differential Splicing Events in IPF Lung Tissue. PloS one Nance, T., Smith, K. S., Anaya, V., Richardson, R., Ho, L., Pala, M., Mostafavi, S., Battle, A., Feghali-Bostwick, C., Rosen, G., Montgomery, S. B. 2014; 9 (3): e92111

Abstract

Idiopathic pulmonary fibrosis (IPF) is a complex disease in which a multitude of proteins and networks are disrupted. Interrogation of the transcriptome through RNA sequencing (RNA-Seq) enables the determination of genes whose differential expression is most significant in IPF, as well as the detection of alternative splicing events which are not easily observed with traditional microarray experiments. We sequenced messenger RNA from 8 IPF lung samples and 7 healthy controls on an Illumina HiSeq 2000, and found evidence for substantial differential gene expression and differential splicing. 873 genes were differentially expressed in IPF (FDR<5%), and 440 unique genes had significant differential splicing events in at least one exonic region (FDR<5%). We used qPCR to validate the differential exon usage in the second and third most significant exonic regions, in the genes COL6A3 (RNA-Seq adjusted pval = 7.18e-10) and POSTN (RNA-Seq adjusted pval = 2.06e-09), which encode the extracellular matrix proteins collagen alpha-3(VI) and periostin. The increased gene-level expression of periostin has been associated with IPF and its clinical progression, but its differential splicing has not been studied in the context of this disease. Our results suggest that alternative splicing of these and other genes may be involved in the pathogenesis of IPF. We have developed an interactive web application which allows users to explore the results of our RNA-Seq experiment, as well as those of two previously published microarray experiments, and we hope that this will serve as a resource for future investigations of gene regulation in IPF.

View details for DOI 10.1371/journal.pone.0092111

View details for PubMedID 24647608

View details for PubMedCentralID PMC3960165
Quantifying RNA allelic ratios by microfluidic multiplex PCR and sequencing. Nature methods Zhang, R., Li, X., Ramaswami, G., Smith, K. S., Turecki, G., Montgomery, S. B., Li, J. B. 2014; 11 (1): 51-54

Abstract

We developed a targeted RNA sequencing method that couples microfluidics-based multiplex PCR and deep sequencing (mmPCR-seq) to uniformly and simultaneously amplify up to 960 loci in 48 samples independently of their gene expression levels and to accurately and cost-effectively measure allelic ratios even for low-quantity or low-quality RNA samples. We applied mmPCR-seq to RNA editing and allele-specific expression studies. mmPCR-seq complements RNA-seq for studying allelic variations in the transcriptome.

View details for DOI 10.1038/nmeth.2736

View details for PubMedID 24270603

View details for PubMedCentralID PMC3877737
Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals GENOME RESEARCH Battle, A., Mostafavi, S., Zhu, X., Potash, J. B., Weissman, M. M., McCormick, C., Haudenschild, C. D., Beckman, K. B., Shi, J., Mei, R., Urban, A. E., Montgomery, S. B., Levinson, D. F., Koller, D. 2014; 24 (1): 14-24

Abstract

Understanding the consequences of regulatory variation in the human genome remains a major challenge, with important implications for understanding gene regulation and interpreting the many disease-risk variants that fall outside of protein-coding regions. Here, we provide a direct window into the regulatory consequences of genetic variation by sequencing RNA from 922 genotyped individuals. We present a comprehensive description of the distribution of regulatory variation-by the specific expression phenotypes altered, the properties of affected genes, and the genomic characteristics of regulatory variants. We detect variants influencing expression of over ten thousand genes, and through the enhanced resolution offered by RNA-sequencing, for the first time we identify thousands of variants associated with specific phenotypes including splicing and allelic expression. Evaluating the effects of both long-range intra-chromosomal and trans (cross-chromosomal) regulation, we observe modularity in the regulatory network, with three-dimensional chromosomal configuration playing a particular role in regulatory modules within each chromosome. We also observe a significant depletion of regulatory variants affecting central and critical genes, along with a trend of reduced effect sizes as variant frequency increases, providing evidence that purifying selection and buffering have limited the deleterious impact of regulatory variation on the cell. Further, generalizing beyond observed variants, we have analyzed the genomic properties of variants associated with expression and splicing and developed a Bayesian model to predict regulatory consequences of genetic variants, applicable to the interpretation of individual genomes and disease studies. Together, these results represent a critical step toward characterizing the complete landscape of human regulatory variation.

View details for DOI 10.1101/gr.155192.113

View details for PubMedID 24092820
Performance of genomic medicine. Genome biology Karczewski, K. J., Montgomery, S. B. 2013; 14 (12): 316

Abstract

A report on the Cold Spring Harbor Laboratory meeting on Precision Medicine: Personal Genomes and Pharmacogenomics, held in Cold Spring Harbor, New York, USA, November 13-16, 2013.

View details for DOI 10.1186/gb4146

View details for PubMedID 24359965
Transcriptome and genome sequencing uncovers functional variation in humans. Nature Lappalainen, T., Sammeth, M., Friedländer, M. R., 't Hoen, P. A., Monlong, J., Rivas, M. A., Gonzàlez-Porta, M., Kurbatova, N., Griebel, T., Ferreira, P. G., Barann, M., Wieland, T., Greger, L., van Iterson, M., Almlöf, J., Ribeca, P., Pulyakhina, I., Esser, D., Giger, T., Tikhonov, A., Sultan, M., Bertier, G., MacArthur, D. G., Lek, M., Lizano, E., Buermans, H. P., Padioleau, I., Schwarzmayr, T., Karlberg, O., Ongen, H., Kilpinen, H., Beltran, S., Gut, M., Kahlem, K., Amstislavskiy, V., Stegle, O., Pirinen, M., Montgomery, S. B., Donnelly, P., McCarthy, M. I., Flicek, P., Strom, T. M., Lehrach, H., Schreiber, S., Sudbrak, R., Carracedo, A., Antonarakis, S. E., Häsler, R., Syvänen, A., van Ommen, G., Brazma, A., Meitinger, T., Rosenstiel, P., Guigó, R., Gut, I. G., Estivill, X., Dermitzakis, E. T. 2013; 501 (7468): 506-511

Abstract

Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

View details for DOI 10.1038/nature12531

View details for PubMedID 24037378
Transcriptome and genome sequencing uncovers functional variation in humans NATURE Lappalainen, T., Sammeth, M., Friedlaender, M. R., 't Hoen, P. A., Monlong, J., Rivas, M. A., Gonzalez-Porta, M., Kurbatova, N., Griebel, T., Ferreira, P. G., Barann, M., Wieland, T., Greger, L., van Iterson, M., Almloef, J., Ribeca, P., Pulyakhina, I., Esser, D., Giger, T., Tikhonov, A., Sultan, M., Bertier, G., MacArthur, D. G., Lek, M., Lizano, E., Buermans, H. P., Padioleau, I., Schwarzmayr, T., Karlberg, O., Ongen, H., Kilpinen, H., Beltran, S., Gut, M., Kahlem, K., Amstislavskiy, V., Stegle, O., Pirinen, M., Montgomery, S. B., Donnelly, P., McCarthy, M. I., Flicek, P., Strom, T. M., Lehrach, H., Schreiber, S., Sudbrak, R., Carracedo, A., Antonarakis, S. E., Haesler, R., Syvaenen, A., van Ommen, G., Brazma, A., Meitinger, T., Rosenstiel, P., Guigo, R., Gut, I. G., Estivill, X., Dermitzakis, E. T. 2013; 501 (7468): 506-511

Abstract

Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

View details for DOI 10.1038/nature12531

View details for Web of Science ID 000324826300049
Systematic functional regulatory assessment of disease-associated variants. Proceedings of the National Academy of Sciences of the United States of America Karczewski, K. J., Dudley, J. T., Kukurba, K. R., Chen, R., Butte, A. J., Montgomery, S. B., Snyder, M. 2013; 110 (23): 9607-9612

Abstract

Genome-wide association studies have discovered many genetic loci associated with disease traits, but the functional molecular basis of these associations is often unresolved. Genome-wide regulatory and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation and may allow for functional assessment of disease-associated loci. Here, we present a unique approach for systematic integration of genetic disease associations, transcription factor binding among individuals, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In an analysis of genome-wide binding profiles of NFκB, we find that disease-associated SNPs are enriched in NFκB binding regions overall, and specifically for inflammatory-mediated diseases, such as asthma, rheumatoid arthritis, and coronary artery disease. Using genome-wide variation in transcription factor-binding data, we find that NFκB binding is often correlated with disease-associated variants in a genotype-specific and allele-specific manner. Furthermore, we show that this binding variation is often related to expression of nearby genes, which are also found to have altered expression in independent profiling of the variant-associated disease condition. Thus, using this integrative approach, we provide a unique means to assign putative function to many disease-associated SNPs.

View details for DOI 10.1073/pnas.1219099110

View details for PubMedID 23690573
Desktop transcriptome sequencing from archival tissue to identify clinically relevant translocations. American journal of surgical pathology Sweeney, R. T., Zhang, B., Zhu, S. X., Varma, S., Smith, K. S., Montgomery, S. B., van de Rijn, M., Zehnder, J., West, R. B. 2013; 37 (6): 796-803

Abstract

Somatic mutations, often translocations or single nucleotide variations, are pathognomonic for certain types of cancers and are increasingly of clinical importance for diagnosis and prediction of response to therapy. Conventional clinical assays only evaluate 1 mutation at a time, and targeted tests are often constrained to identify only the most common mutations. Genome-wide or transcriptome-wide high-throughput sequencing (HTS) of clinical samples offers an opportunity to evaluate for all clinically significant mutations with a single test. Recently a "desktop version" of HTS has become available, but most of the experience to date is based on data obtained from high-quality DNA from frozen specimens. In this study, we demonstrate, as a proof of principle, that translocations in sarcomas can be diagnosed from formalin-fixed paraffin-embedded (FFPE) tissue with desktop HTS. Using the first generation MiSeq platform, full transcriptome sequencing was performed on FFPE material from archival blocks of 3 synovial sarcomas, 3 myxoid liposarcomas, 2 Ewing sarcomas, and 1 clear cell sarcoma. Mapping the reads to the "sarcomatome" (all known 83 genes involved in translocations and mutations in sarcoma) and using a novel algorithm for ranking fusion candidates, the pathognomonic fusions and the exact breakpoints were identified in all cases of synovial sarcoma, myxoid liposarcoma, and clear cell sarcoma. The Ewing sarcoma fusion gene was detectable in FFPE material only with a sequencing platform that generates greater sequencing depth. The results show that a single transcriptome HTS assay, from FFPE, has the potential to replace conventional molecular diagnostic techniques for the evaluation of clinically relevant mutations in cancer.

View details for DOI 10.1097/PAS.0b013e31827ad9b2

View details for PubMedID 23598961
The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome research Montgomery, S. B., Goode, D. L., Kvikstad, E., Albers, C. A., Zhang, Z. D., Mu, X. J., Ananda, G., Howie, B., Karczewski, K. J., Smith, K. S., Anaya, V., Richardson, R., Davis, J., MacArthur, D. G., Sidow, A., Duret, L., Gerstein, M., Makova, K. D., Marchini, J., McVean, G., Lunter, G. 2013; 23 (5): 749-761

Abstract

Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.

View details for DOI 10.1101/gr.148718.112

View details for PubMedID 23478400

View details for PubMedCentralID PMC3638132
Examination of the relationship between variation at 17q21 and childhood wheeze phenotypes JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY Granell, R., Henderson, A. J., Timpson, N., St Pourcain, B., Kemp, J. P., Ring, S. M., Ho, K., Montgomery, S. B., Dermitzakis, E. T., Evans, D. M., Sterne, J. A. 2013; 131 (3): 685-694

Abstract

Genome-wide association studies have identified associations of genetic variants at 17q21 near ORMDL3 with childhood asthma.We sought to determine whether associations in this region are specific to particular asthma phenotypes and specific to ORMDL3.We examined associations between 244 independent single nucleotide polymorphisms (SNPs) plus 13 previously identified asthma-related SNPs in the region between 34 and 36 Mb on chromosome 17 and early wheezing phenotypes, doctor-diagnosed asthma and atopy at 7½ years, and bronchial hyperresponsiveness and lung function at 8½ years in 7045 children from the Avon Longitudinal Study of Parents and Children birth cohort study. With this, cis expression quantitative trait loci signals for the same SNPs were assessed in 875 samples across genes in the same region.The strongest evidence for phenotypic association was seen for persistent wheezing (rs8076131 near ORMDL3: relative risk ratio [RRR], 1.60 [95% CI, 1.40-1.84], P = 1.4 × 10(-11); rs2305480 near GSDML: RRR, 1.60 [95% CI, 1.39-1.83], P = 1.5 × 10(-11); and rs9303277 near IKZF3: RRR, 1.57 [95% CI, 1.37-1.79], P = 4.4 × 10(-11)). Similar but less precisely estimated effects were seen for intermediate-onset wheeze, but there was little evidence of associations with other wheezing phenotypes. There was some evidence of associations with bronchial hyperresponsiveness. SNPs across the whole region show strong evidence of association with differential levels of expression at GSDML, IKZF3, and MED24, as well as ORMDL3.Associations of SNPs in the 17q21 locus are specific to asthma and specific wheezing phenotypes and are not explained by associations with intermediate phenotypes, such as atopy or lung function.

View details for DOI 10.1016/j.jaci.2012.09.021

View details for Web of Science ID 000315587800008

View details for PubMedID 23154084
Integrating GWAS and Expression Data for Functional Characterization of Disease-Associated SNPs: An Application to Follicular Lymphoma AMERICAN JOURNAL OF HUMAN GENETICS Conde, L., Bracci, P. M., Richardson, R., Montgomery, S. B., Skibola, C. F. 2013; 92 (1): 126-130

Abstract

Development of post-GWAS (genome-wide association study) methods are greatly needed for characterizing the function of trait-associated SNPs. Strategies integrating various biological data sets with GWAS results will provide insights into the mechanistic role of associated SNPs. Here, we present a method that integrates RNA sequencing (RNA-seq) and allele-specific expression data with GWAS data to further characterize SNPs associated with follicular lymphoma (FL). We investigated the influence on gene expression of three established FL-associated loci-rs10484561, rs2647012, and rs6457327-by measuring their correlation with human-leukocyte-antigen (HLA) expression levels obtained from publicly available RNA-seq expression data sets from lymphoblastoid cell lines. Our results suggest that SNPs linked to the protective variant rs2647012 exert their effect by a cis-regulatory mechanism involving modulation of HLA-DQB1 expression. In contrast, no effect on HLA expression was observed for the colocalized risk variant rs10484561. The application of integrative methods, such as those presented here, to other post-GWAS investigations will help identify causal disease variants and enhance our understanding of biological disease mechanisms.

View details for DOI 10.1016/j.ajhg.2012.11.009

View details for Web of Science ID 000313759000013

View details for PubMedID 23246294

View details for PubMedCentralID PMC3542469
Passive and active DNA methylation and the interplay with genetic variation in gene regulation. eLife Gutierrez-Arcelus, M., Lappalainen, T., Montgomery, S. B., Buil, A., Ongen, H., Yurovsky, A., Bryois, J., Giger, T., Romano, L., Planchon, A., Falconnet, E., Bielser, D., Gagnebin, M., Padioleau, I., Borel, C., Letourneau, A., Makrythanasis, P., Guipponi, M., Gehrig, C., Antonarakis, S. E., Dermitzakis, E. T. 2013; 2

Abstract

DNA methylation is an essential epigenetic mark whose role in gene regulation and its dependency on genomic sequence and environment are not fully understood. In this study we provide novel insights into the mechanistic relationships between genetic variation, DNA methylation and transcriptome sequencing data in three different cell-types of the GenCord human population cohort. We find that the association between DNA methylation and gene expression variation among individuals are likely due to different mechanisms from those establishing methylation-expression patterns during differentiation. Furthermore, cell-type differential DNA methylation may delineate a platform in which local inter-individual changes may respond to or act in gene regulation. We show that unlike genetic regulatory variation, DNA methylation alone does not significantly drive allele specific expression. Finally, inferred mechanistic relationships using genetic variation as well as correlations with TF abundance reveal both a passive and active role of DNA methylation to regulatory interactions influencing gene expression. DOI:http://dx.doi.org/10.7554/eLife.00523.001.

View details for DOI 10.7554/eLife.00523

View details for PubMedID 23755361
Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge. PloS one Mostafavi, S., Battle, A., Zhu, X., Urban, A. E., Levinson, D., Montgomery, S. B., Koller, D. 2013; 8 (7)

View details for DOI 10.1371/journal.pone.0068141

View details for PubMedID 23874524
Cancer Transcriptome Sequencing and Analysis Cancer Genomics: From Bench to Personalized Medicine Morin, R. D., Montgomery, S. B. Elsevier. 2013; 1: 31–49

View details for DOI http://dx.doi.org/10.1016/B978-0-12-396967-5.00003-7
Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge. PloS one Mostafavi, S., Battle, A., Zhu, X., Urban, A. E., Levinson, D., Montgomery, S. B., Koller, D. 2013; 8 (7)

Abstract

Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcriptome, including novel transcriptional events. However, as with earlier expression assays, analysis of RNA-sequencing data requires carefully accounting for factors that may introduce systematic, confounding variability in the expression measurements, resulting in spurious correlations. Here, we consider the problem of modeling and removing the effects of known and hidden confounding factors from RNA-sequencing data. We describe a unified residual framework that encapsulates existing approaches, and using this framework, present a novel method, HCP (Hidden Covariates with Prior). HCP uses a more informed assumption about the confounding factors, and performs as well or better than existing approaches while having a much lower computational cost. Our experiments demonstrate that accounting for known and hidden factors with appropriate models improves the quality of RNA-sequencing data in two very different tasks: detecting genetic variations that are associated with nearby expression variations (cis-eQTLs), and constructing accurate co-expression networks.

View details for DOI 10.1371/journal.pone.0068141

View details for PubMedID 23874524
Detection and impact of rare regulatory variants in human disease. Frontiers in genetics Li, X., Montgomery, S. B. 2013; 4: 67-?

Abstract

Advances in genome sequencing are providing unprecedented resolution of rare and private variants. However, methods which assess the effect of these variants have relied predominantly on information within coding sequences. Assessing their impact in non-coding sequences remains a significant contemporary challenge. In this review, we highlight the role of regulatory variation as causative agents and modifiers of monogenic disorders. We further discuss how advances in functional genomics are now providing new opportunity to assess the impact of rare non-coding variants and their role in disease.

View details for DOI 10.3389/fgene.2013.00067

View details for PubMedID 23755067

View details for PubMedCentralID PMC3668132
Sex-biased genetic effects on gene regulation in humans GENOME RESEARCH Dimas, A. S., Nica, A. C., Montgomery, S. B., Stranger, B. E., Raj, T., Buil, A., Giger, T., Lappalainen, T., Gutierrez-Arcelus, M., McCarthy, M. I., Dermitzakis, E. T. 2012; 22 (12): 2368-2375

Abstract

Human regulatory variation, reported as expression quantitative trait loci (eQTLs), contributes to differences between populations and tissues. The contribution of eQTLs to differences between sexes, however, has not been investigated to date. Here we explore regulatory variation in females and males and demonstrate that 12%-15% of autosomal eQTLs function in a sex-biased manner. We show that genes possessing sex-biased eQTLs are expressed at similar levels across the sexes and highlight cases of genes controlling sexually dimorphic and shared traits that are under the control of distinct regulatory elements in females and males. This study illustrates that sex provides important context that can modify the effects of functional genetic variants.

View details for DOI 10.1101/gr.134981.111

View details for Web of Science ID 000311895500005

View details for PubMedID 22960374
Mapping cis- and trans-regulatory effects across multiple tissues in twins NATURE GENETICS Grundberg, E., Small, K. S., Hedman, A. K., Nica, A. C., Buil, A., Keildson, S., Bell, J. T., Yang, T., Meduri, E., Barrett, A., Nisbett, J., Sekowska, M., Wilk, A., Shin, S., Glass, D., Travers, M., Min, J. L., Ring, S., Ho, K., Thorleifsson, G., Kong, A., Thorsteindottir, U., Ainali, C., Dimas, A. S., Hassanali, N., Ingle, C., Knowles, D., Krestyaninova, M., Lowe, C. E., Di Meglio, P., Montgomery, S. B., Parts, L., Potter, S., Surdulescu, G., Tsaprouni, L., Tsoka, S., Bataille, V., Durbin, R., Nestle, F. O., O'Rahilly, S., Soranzo, N., Lindgren, C. M., Zondervan, K. T., Ahmadi, K. R., Schadt, E. E., Stefansson, K., Smith, G. D., McCarthy, M. I., Deloukas, P., Dermitzakis, E. T., Spector, T. D. 2012; 44 (10): 1084-?

Abstract

Sequence-based variation in gene expression is a key driver of disease risk. Common variants regulating expression in cis have been mapped in many expression quantitative trait locus (eQTL) studies, typically in single tissues from unrelated individuals. Here, we present a comprehensive analysis of gene expression across multiple tissues conducted in a large set of mono- and dizygotic twins that allows systematic dissection of genetic (cis and trans) and non-genetic effects on gene expression. Using identity-by-descent estimates, we show that at least 40% of the total heritable cis effect on expression cannot be accounted for by common cis variants, a finding that reveals the contribution of low-frequency and rare regulatory variants with respect to both transcriptional regulation and complex trait susceptibility. We show that a substantial proportion of gene expression heritability is trans to the structural gene, and we identify several replicating trans variants that act predominantly in a tissue-restricted manner and may regulate the transcription of many genes.

View details for DOI 10.1038/ng.2394

View details for Web of Science ID 000309550200006

View details for PubMedID 22941192
Genotype-Based Test in Mapping Cis-Regulatory Variants from Allele-Specific Expression Data PLOS ONE Lefebvre, J. F., Vello, E., Ge, B., Montgomery, S. B., Dermitzakis, E. T., Pastinen, T., Labuda, D. 2012; 7 (6)

Abstract

Identifying and understanding the impact of gene regulatory variation is of considerable importance in evolutionary and medical genetics; such variants are thought to be responsible for human-specific adaptation and to have an important role in genetic disease. Regulatory variation in cis is readily detected in individuals showing uneven expression of a transcript from its two allelic copies, an observation referred to as allelic imbalance (AI). Identifying individuals exhibiting AI allows mapping of regulatory DNA regions and the potential to identify the underlying causal genetic variant(s). However, existing mapping methods require knowledge of the haplotypes, which make them sensitive to phasing errors. In this study, we introduce a genotype-based mapping test that does not require haplotype-phase inference to locate regulatory regions. The test relies on partitioning genotypes of individuals exhibiting AI and those not expressing AI in a 2×3 contingency table. The performance of this test to detect linkage disequilibrium (LD) between a potential regulatory site and a SNP located in this region was examined by analyzing the simulated and the empirical AI datasets. In simulation experiments, the genotype-based test outperforms the haplotype-based tests with the increasing distance separating the regulatory region from its regulated transcript. The genotype-based test performed equally well with the experimental AI datasets, either from genome-wide cDNA hybridization arrays or from RNA sequencing. By avoiding the need of haplotype inference, the genotype-based test will suit AI analyses in population samples of unknown haplotype structure and will additionally facilitate the identification of cis-regulatory variants that are located far away from the regulated transcript.

View details for DOI 10.1371/journal.pone.0038667

View details for Web of Science ID 000305351700058

View details for PubMedID 22685595
Patterns of Cis Regulatory Variation in Diverse Human Populations PLOS GENETICS Stranger, B. E., Montgomery, S. B., Dimas, A. S., Parts, L., Stegle, O., Ingle, C. E., Sekowska, M., Smith, G. D., Evans, D., Gutierrez-Arcelus, M., Price, A., Raj, T., Nisbett, J., Nica, A. C., Beazley, C., Durbin, R., Deloukas, P., Dermitzakis, E. T. 2012; 8 (4): 272-284

Abstract

The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.

View details for DOI 10.1371/journal.pgen.1002639

View details for Web of Science ID 000303441800020

View details for PubMedID 22532805
A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes SCIENCE MacArthur, D. G., Balasubramanian, S., Frankish, A., Huang, N., Morris, J., Walter, K., Jostins, L., Habegger, L., Pickrell, J. K., Montgomery, S. B., Albers, C. A., Zhang, Z. D., Conrad, D. F., Lunter, G., Zheng, H., Ayub, Q., DePristo, M. A., Banks, E., Hu, M., Handsaker, R. E., Rosenfeld, J. A., Fromer, M., Jin, M., Mu, X. J., Khurana, E., Ye, K., Kay, M., Saunders, G. I., Suner, M., Hunt, T., Barnes, I. H., Amid, C., Carvalho-Silva, D. R., Bignell, A. H., Snow, C., Yngvadottir, B., Bumpstead, S., Cooper, D. N., Xue, Y., Romero, I. G., Wang, J., Li, Y., Gibbs, R. A., McCarroll, S. A., Dermitzakis, E. T., Pritchard, J. K., Barrett, J. C., Harrow, J., Hurles, M. E., Gerstein, M. B., Tyler-Smith, C. 2012; 335 (6070): 823-828

Abstract

Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.

View details for DOI 10.1126/science.1215040

View details for Web of Science ID 000300356400036

View details for PubMedID 22344438

View details for PubMedCentralID PMC3299548
Meta-analysis of genome-wide association studies identifies three new risk loci for atopic dermatitis NATURE GENETICS Paternoster, L., Standl, M., Chen, C., Ramasamy, A., Bonnelykke, K., Duijts, L., Ferreira, M. A., Alves, A. C., Thyssen, J. P., Albrecht, E., Baurecht, H., Feenstra, B., Sleiman, P. M., Hysi, P., Warrington, N. M., Curjuric, I., Myhre, R., Curtin, J. A., Groen-Blokhuis, M. M., Kerkhof, M., Saaf, A., Franke, A., Ellinghaus, D., Foelster-Holst, R., Dermitzakis, E., Montgomery, S. B., Prokisch, H., Heim, K., Hartikainen, A., Pouta, A., Pekkanen, J., Blakemore, A. I., Buxton, J. L., Kaakinen, M., Duffy, D. L., Madden, P. A., Heath, A. C., Montgomery, G. W., Thompson, P. J., Matheson, M. C., Le Souef, P., St Pourcain, B., Smith, G. D., Henderson, J., Kemp, J. P., Timpson, N. J., Deloukas, P., Ring, S. M., Wichmann, H., Mueller-Nurasyid, M., Novak, N., Klopp, N., Rodriguez, E., McArdle, W., Linneberg, A., Menne, T., Nohr, E. A., Hofman, A., Uitterlinden, A. G., van Duijin, C. M., Rivadeneira, F., de Jongste, J. C., van der Valk, R. J., Wjst, M., Jogi, R., Geller, F., Boyd, H. A., Murray, J. C., Kim, C., Mentch, F., March, M., Mangino, M., Spector, T. D., Bataille, V., Pennell, C. E., Holt, P. G., Sly, P., Tiesler, C. M., Thiering, E., Illig, T., Imboden, M., Nystad, W., Simpson, A., Hottenga, J., Postma, D., Koppelman, G. H., Smit, H. A., Soderhall, C., Chawes, B., Kreiner-Moller, E., Bisgaard, H., Melen, E., Boomsma, D. I., Custovic, A., Jacobsson, B., Probst-Hensch, N. M., Palmer, L. J., Glass, D., Hakonarson, H., Melbye, M., Jarvis, D. L., Jaddoe, V. W., Gieger, C., Strachan, D. P., Martin, N. G., Jarvelin, M., Heinrich, J., Evans, D. M., Weidinger, S. 2012; 44 (2): 187-192

Abstract

Atopic dermatitis (AD) is a commonly occurring chronic skin disease with high heritability. Apart from filaggrin (FLG), the genes influencing atopic dermatitis are largely unknown. We conducted a genome-wide association meta-analysis of 5,606 affected individuals and 20,565 controls from 16 population-based cohorts and then examined the ten most strongly associated new susceptibility loci in an additional 5,419 affected individuals and 19,833 controls from 14 studies. Three SNPs reached genome-wide significance in the discovery and replication cohorts combined, including rs479844 upstream of OVOL1 (odds ratio (OR) = 0.88, P = 1.1 × 10(-13)) and rs2164983 near ACTL9 (OR = 1.16, P = 7.1 × 10(-9)), both of which are near genes that have been implicated in epidermal proliferation and differentiation, as well as rs2897442 in KIF3A within the cytokine cluster at 5q31.1 (OR = 1.11, P = 3.8 × 10(-8)). We also replicated association with the FLG locus and with two recently identified association signals at 11q13.5 (rs7927894; P = 0.008) and 20q13.33 (rs6010620; P = 0.002). Our results underline the importance of both epidermal barrier function and immune dysregulation in atopic dermatitis pathogenesis.

View details for DOI 10.1038/ng.1017

View details for Web of Science ID 000299664400018

View details for PubMedID 22197932

View details for PubMedCentralID PMC3272375
DNA methylation profiles of human active and inactive X chromosomes GENOME RESEARCH Sharp, A. J., Stathaki, E., Migliavacca, E., Brahmachary, M., Montgomery, S. B., Dupre, Y., Antonarakis, S. E. 2011; 21 (10): 1592-1600

Abstract

X-chromosome inactivation (XCI) is a dosage compensation mechanism that silences the majority of genes on one X chromosome in each female cell. To characterize epigenetic changes that accompany this process, we measured DNA methylation levels in 45,X patients carrying a single active X chromosome (X(a)), and in normal females, who carry one X(a) and one inactive X (X(i)). Methylated DNA was immunoprecipitated and hybridized to high-density oligonucleotide arrays covering the X chromosome, generating epigenetic profiles of active and inactive X chromosomes. We observed that XCI is accompanied by changes in DNA methylation specifically at CpG islands (CGIs). While the majority of CGIs show increased methylation levels on the X(i), XCI actually results in significant reductions in methylation at 7% of CGIs. Both intra- and inter-genic CGIs undergo epigenetic modification, with the biggest increase in methylation occurring at the promoters of genes silenced by XCI. In contrast, genes escaping XCI generally have low levels of promoter methylation, while genes that show inter-individual variation in silencing show intermediate increases in methylation. Thus, promoter methylation and susceptibility to XCI are correlated. We also observed a global correlation between CGI methylation and the evolutionary age of X-chromosome strata, and that genes escaping XCI show increased methylation within gene bodies. We used our epigenetic map to predict 26 novel genes escaping XCI, and searched for parent-of-origin-specific methylation differences, but found no evidence to support imprinting on the human X chromosome. Our study provides a detailed analysis of the epigenetic profile of active and inactive X chromosomes.

View details for DOI 10.1101/gr.112680.110

View details for Web of Science ID 000295407800004

View details for PubMedID 21862626
Epistatic Selection between Coding and Regulatory Variation in Human Evolution and Disease AMERICAN JOURNAL OF HUMAN GENETICS Lappalainen, T., Montgomery, S. B., Nica, A. C., Dermitzakis, E. T. 2011; 89 (3): 459-463

Abstract

Interaction (nonadditive effects) between genetic variants has been highlighted as an important mechanism underlying phenotypic variation, but the discovery of genetic interactions in humans has proved difficult. In this study, we show that the spectrum of variation in the human genome has been shaped by modifier effects of cis-regulatory variation on the functional impact of putatively deleterious protein-coding variants. We analyzed 1000 Genomes population-scale resequencing data from Europe (CEU [Utah residents with Northern and Western European ancestry from the CEPH collection]) and Africa (YRI [Yoruba in Ibadan, Nigeria]) together with gene expression data from arrays and RNA sequencing for the same samples. We observed an underrepresentation of derived putatively functional coding variation on the more highly expressed regulatory haplotype, which suggests stronger purifying selection against deleterious coding variants that have increased penetrance because of their regulatory background. Furthermore, the frequency spectrum and impact size distribution of common regulatory polymorphisms (eQTLs) appear to be shaped in order to minimize the selective disadvantage of having deleterious coding mutations on the more highly expressed haplotype. Interestingly, eQTLs explaining common disease GWAS signals showed an enrichment of putative epistatic effects, suggesting that some disease associations might arise from interactions increasing the penetrance of rare coding variants. In conclusion, our results indicate that regulatory and coding variants often modify the functional impact of each other. This specific type of genetic interaction is detectable from sequencing data in a genome-wide manner, and characterizing these joint effects might help us understand functional mechanisms behind genetic associations to human phenotypes-including both Mendelian and common disease.

View details for DOI 10.1016/j.ajhg.2011.08.004

View details for Web of Science ID 000294939800012

View details for PubMedID 21907014
Rare and Common Regulatory Variation in Population-Scale Sequenced Human Genomes PLOS GENETICS Montgomery, S. B., Lappalainen, T., Gutierrez-Arcelus, M., Dermitzakis, E. T. 2011; 7 (7)

Abstract

Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.

View details for DOI 10.1371/journal.pgen.1002144

View details for Web of Science ID 000293338600007

View details for PubMedID 21811411
Genome-wide association study identifies a common variant associated with risk of endometrial cancer NATURE GENETICS Spurdle, A. B., Thompson, D. J., Ahmed, S., Ferguson, K., Healey, C. S., O'Mara, T., Walker, L. C., Montgomery, S. B., Dermitzakis, E. T., Fahey, P., Montgomery, G. W., Webb, P. M., Fasching, P. A., Beckmann, M. W., Ekici, A. B., Hein, A., Lambrechts, D., Coenegrachts, L., Vergote, I., Amant, F., Salvesen, H. B., Trovik, J., Njolstad, T. S., Helland, H., Scott, R. J., Ashton, K., Proietto, T., Otton, G., Tomlinson, I., Gorman, M., Howarth, K., Hodgson, S., Garcia-Closas, M., Wentzensen, N., Yang, H., Chanock, S., Hall, P., Czene, K., Liu, J., Li, J., Shu, X., Zheng, W., Long, J., Xiang, Y., Shah, M., Morrison, J., Michailidou, K., Pharoah, P. D., Dunning, A. M., Easton, D. F. 2011; 43 (5): 451-?

Abstract

Endometrial cancer is the most common malignancy of the female genital tract in developed countries. To identify genetic variants associated with endometrial cancer risk, we performed a genome-wide association study involving 1,265 individuals with endometrial cancer (cases) from Australia and the UK and 5,190 controls from the Wellcome Trust Case Control Consortium. We compared genotype frequencies in cases and controls for 519,655 SNPs. Forty seven SNPs that showed evidence of association with endometrial cancer in stage 1 were genotyped in 3,957 additional cases and 6,886 controls. We identified an endometrial cancer susceptibility locus close to HNF1B at 17q12 (rs4430796, P = 7.1 × 10(-10)) that is also associated with risk of prostate cancer and is inversely associated with risk of type 2 diabetes.

View details for DOI 10.1038/ng.812

View details for Web of Science ID 000289972600015

View details for PubMedID 21499250
From expression QTLs to personalized transcriptomics NATURE REVIEWS GENETICS Montgomery, S. B., Dermitzakis, E. T. 2011; 12 (4): 277-282

Abstract

Approaches that combine expression quantitative trait loci (eQTLs) and genome-wide association (GWA) studies are offering new functional information about the aetiology of complex human traits and diseases. Improved study designs--which take into account technological advances in resolving the transcriptome, cell history and state, population of origin and diverse endophenotypes--are providing insights into the architecture of disease and the landscape of gene regulation in humans. Furthermore, these advances are helping to establish links between cellular effects and organismal traits.

View details for DOI 10.1038/nrg2969

View details for Web of Science ID 000288531700011

View details for PubMedID 21386863
The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study PLOS GENETICS Nica, A. C., Parts, L., Glass, D., Nisbet, J., Barrett, A., Sekowska, M., Travers, M., Potter, S., Grundberg, E., Small, K., Hedman, A. K., Bataille, V., Bell, J. T., Surdulescu, G., Dimas, A. S., Ingle, C., Nestle, F. O., Di Meglio, P., Min, J. L., Wilk, A., Hammond, C. J., Hassanali, N., Yang, T., Montgomery, S. B., O'Rahilly, S., Lindgren, C. M., Zondervan, K. T., Soranzo, N., Barroso, I., Durbin, R., Ahmadi, K., Deloukas, P., McCarthy, M. I., Dermitzakis, E. T., Spector, T. D. 2011; 7 (2)

Abstract

While there have been studies exploring regulatory variation in one or more tissues, the complexity of tissue-specificity in multiple primary tissues is not yet well understood. We explore in depth the role of cis-regulatory variation in three human tissues: lymphoblastoid cell lines (LCL), skin, and fat. The samples (156 LCL, 160 skin, 166 fat) were derived simultaneously from a subset of well-phenotyped healthy female twins of the MuTHER resource. We discover an abundance of cis-eQTLs in each tissue similar to previous estimates (858 or 4.7% of genes). In addition, we apply factor analysis (FA) to remove effects of latent variables, thus more than doubling the number of our discoveries (1,822 eQTL genes). The unique study design (Matched Co-Twin Analysis--MCTA) permits immediate replication of eQTLs using co-twins (93%-98%) and validation of the considerable gain in eQTL discovery after FA correction. We highlight the challenges of comparing eQTLs between tissues. After verifying previous significance threshold-based estimates of tissue-specificity, we show their limitations given their dependency on statistical power. We propose that continuous estimates of the proportion of tissue-shared signals and direct comparison of the magnitude of effect on the fold change in expression are essential properties that jointly provide a biologically realistic view of tissue-specificity. Under this framework we demonstrate that 30% of eQTLs are shared among the three tissues studied, while another 29% appear exclusively tissue-specific. However, even among the shared eQTLs, a substantial proportion (10%-20%) have significant differences in the magnitude of fold change between genotypic classes across tissues. Our results underline the need to account for the complexity of eQTL tissue-specificity in an effort to assess consequences of such variants for complex traits.

View details for DOI 10.1371/journal.pgen.1002003

View details for Web of Science ID 000287697300035

View details for PubMedID 21304890
Identification of cis- and trans- regulatory variation modulating microRNA expression levels in human fibroblasts GENOME RESEARCH Borel, C., Deutsch, S., Letourneau, A., Migliavacca, E., Montgomery, S. B., Dimas, A. S., Vejnar, C. E., Attar, H., Gagnebin, M., Gehrig, C., Falconnet, E., Dupre, Y., Dermitzakis, E. T., Antonarakis, S. E. 2011; 21 (1): 68-73

Abstract

MicroRNAs (miRNAs) are regulatory noncoding RNAs that affect the production of a significant fraction of human mRNAs via post-transcriptional regulation. Interindividual variation of the miRNA expression levels is likely to influence the expression of miRNA target genes and may therefore contribute to phenotypic differences in humans, including susceptibility to common disorders. The extent to which miRNA levels are genetically controlled is largely unknown. In this report, we assayed the expression levels of miRNAs in primary fibroblasts from 180 European newborns of the GenCord project and performed association analysis to identify eQTLs (expression quantitative traits loci). We detected robust expression for 121 miRNAs out of 365 interrogated. We have identified significant cis- (10%) and trans- (11%) eQTLs. Furthermore, we detected one genomic locus (rs1522653) that influences the expression levels of five miRNAs, thus unraveling a novel mechanism for coregulation of miRNA expression.

View details for DOI 10.1101/gr.109371.110

View details for Web of Science ID 000285868300007

View details for PubMedID 21147911
The functional spectrum of low-frequency coding variation GENOME BIOLOGY Marth, G. T., Yu, F., Indap, A. R., Garimella, K., Gravel, S., Leong, W. F., Tyler-Smith, C., Bainbridge, M., Blackwell, T., Zheng-Bradley, X., Chen, Y., Challis, D., Clarke, L., Ball, E. V., Cibulskis, K., Cooper, D. N., Fulton, B., Hartl, C., Koboldt, D., Muzny, D., Smith, R., Sougnez, C., Stewart, C., Ward, A., Yu, J., Xue, Y., Altshuler, D., Bustamante, C. D., Clark, A. G., Daly, M., DePristo, M., Flicek, P., Gabriel, S., Mardis, E., Palotie, A., Gibbs, R. 2011; 12 (9)

Abstract

Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants.This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.

View details for DOI 10.1186/gb-2011-12-9-r84

View details for Web of Science ID 000298926900001

View details for PubMedID 21917140
A map of human genome variation from population-scale sequencing NATURE Altshuler, D., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., Collins, F. S., De La Vega, F. M., Donnelly, P., Egholm, M., Flicek, P., Gabriel, S. B., Gibbs, R. A., Knoppers, B. M., Lander, E. S., Lehrach, H., Mardis, E. R., McVean, G. A., Nickerson, D., Peltonen, L., Schafer, A. J., Sherry, S. T., Wang, J., Wilson, R. K., Gibbs, R. A., Deiros, D., Metzker, M., Muzny, D., Reid, J., Wheeler, D., Wang, J., Li, J., Jian, M., Li, G., Li, R., Liang, H., Tian, G., Wang, B., Wang, J., Wang, W., Yang, H., Zhang, X., Zheng, H., Lander, E. S., Altshuler, D. L., Ambrogio, L., Bloom, T., Cibulskis, K., Fennell, T. J., Gabriel, S. B., Jaffe, D. B., Shefler, E., Sougnez, C. L., Bentley, D. R., Gormley, N., Humphray, S., Kingsbury, Z., Koko-Gonzales, P., Stone, J., McKernan, K. J., Costa, G. L., Ichikawa, J. K., Lee, C. C., Sudbrak, R., Lehrach, H., Borodina, T. A., Dahl, A., Davydov, A. N., Marquardt, P., Mertes, F., Nietfeld, W., Rosenstiel, P., Schreiber, S., Soldatov, A. V., Timmermann, B., Tolzmann, M., Egholm, M., Affourtit, J., Ashworth, D., Attiya, S., Bachorski, M., Buglione, E., Burke, A., Caprio, A., Celone, C., Clark, S., Conners, D., Desany, B., Gu, L., Guccione, L., Kao, K., Kebbel, A., Knowlton, J., Labrecque, M., McDade, L., Mealmaker, C., Minderman, M., Nawrocki, A., Niazi, F., Pareja, K., Ramenani, R., Riches, D., Song, W., Turcotte, C., Wang, S., Mardis, E. R., Dooling, D., Fulton, L., Fulton, R., Weinstock, G., Durbin, R. M., Burton, J., Carter, D. M., Churcher, C., Coffey, A., Cox, A., Palotie, A., Quail, M., Skelly, T., Stalker, J., Swerdlow, H. P., Turner, D., De Witte, A., Giles, S., Gibbs, R. A., Wheeler, D., Bainbridge, M., Challis, D., Sabo, A., Yu, F., Yu, J., Wang, J., Fang, X., Guo, X., Li, R., Li, Y., Luo, R., Tai, S., Wu, H., Zheng, H., Zheng, X., Zhou, Y., Yang, H., Marth, G. T., Garrison, E. P., Huang, W., Indap, A., Kural, D., Lee, W., Leong, W. F., Huang, W., Indap, A., Kural, D., Lee, W., Leong, W. F., Quinlan, A. R., Stewart, C., Stromberg, M. P., Ward, A. N., Wu, J., Lee, C., Mills, R. E., Shi, X., Daly, M. J., DePristo, M. A., Altshuler, D. L., Ball, A. D., Banks, E., Bloom, T., Browning, B. L., Cibulskis, K., Fennell, T. J., Garimella, K. V., Grossman, S. R., Handsaker, R. E., Hanna, M., Hartl, C., Jaffe, D. B., Kernytsky, A. M., Korn, J. M., Li, H., Maguire, J. R., McCarroll, S. A., McKenna, A., Nemesh, J. C., Philippakis, A. A., Poplin, R. E., Price, A., Rivas, M. A., Sabeti, P. C., Schaffner, S. F., Shefler, E., Shlyakhter, I. A., Cooper, D. N., Ball, E. V., Mort, M., Phillips, A. D., Stenson, P. D., Sebat, J., Makarov, V., Ye, K., Yoon, S. C., Bustamante, C. D., Clark, A. G., Boyko, A., Degenhardt, J., Gravel, S., Gutenkunst, R. N., Kaganovich, M., Keinan, A., Lacroute, P., Ma, X., Reynolds, A., Clarke, L., Flicek, P., Cunningham, F., Herrero, J., Keenen, S., Kulesha, E., Leinonen, R., McLaren, W., Radhakrishnan, R., Smith, R. E., Zalunin, V., Zheng-Bradley, X., Korbel, J. O., Stuetz, A. M., Humphray, S., Bauer, M., Cheetham, R. K., Cox, T., Eberle, M., James, T., Kahn, S., Murray, L., Ye, K., De La Vega, F. M., Fu, Y., Hyland, F. C., Manning, J. M., McLaughlin, S. F., Peckham, H. E., Sakarya, O., Sun, Y. A., Tsung, E. F., Batzer, M. A., Konkel, M. K., Walker, J. A., Sudbrak, R., Albrecht, M. W., Amstislavskiy, V. S., Herwig, R., Parkhomchuk, D. V., Sherry, S. T., Agarwala, R., Khouri, H., Morgulis, A. O., Paschall, J. E., Phan, L. D., Rotmistrovsky, K. E., Sanders, R. D., Shumway, M. F., Xiao, C., McVean, G. A., Auton, A., Iqbal, Z., Lunter, G., Marchini, J. L., Moutsianas, L., Myers, S., Tumian, A., Desany, B., Knight, J., Winer, R., Craig, D. W., Beckstrom-Sternberg, S. M., Christoforides, A., Kurdoglu, A. A., Pearson, J., Sinari, S. A., Tembe, W. D., Haussler, D., Hinrichs, A. S., Katzman, S. J., Kern, A., Kuhn, R. M., Przeworski, M., Hernandez, R. D., Howie, B., Kelley, J. L., Melton, S. C., Abecasis, G. R., Li, Y., Anderson, P., Blackwell, T., Chen, W., Cookson, W. O., Ding, J., Kang, H. M., Lathrop, M., Liang, L., Moffatt, M. F., Scheet, P., Sidore, C., Snyder, M., Zhan, X., Zoellner, S., Awadalla, P., Casals, F., Idaghdour, Y., Keebler, J., Stone, E. A., Zilversmit, M., Jorde, L., Xing, J., Eichler, E. E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J. M., Sahinalp, S. C., Sudmant, P. H., Mardis, E. R., Chen, K., Chinwalla, A., Ding, L., Koboldt, D. C., McLellan, M. D., Dooling, D., Weinstock, G., Wallis, J. W., Wendl, M. C., Zhang, Q., Durbin, R. M., Albers, C. A., Ayub, Q., Balasubramaniam, S., Barrett, J. C., Carter, D. M., Chen, Y., Conrad, D. F., Danecek, P., Dermitzakis, E. T., Hu, M., Huang, N., Hurles, M. E., Jin, H., Jostins, L., Keane, T. M., Keane, T. M., Le, S. Q., Lindsay, S., Long, Q., MacArthur, D. G., Montgomery, S. B., Parts, L., Stalker, J., Tyler-Smith, C., Walter, K., Zhang, Y., Gerstein, M. B., Snyder, M., Abyzov, A., Abyzov, A., Balasubramanian, S., Bjornson, R., Du, J., Grubert, F., Habegger, L., Haraksingh, R., Jee, J., Khurana, E., Lam, H. Y., Leng, J., Mu, X. J., Urban, A. E., Zhang, Z., Li, Y., Luo, R., Marth, G. T., Garrison, E. P., Kural, D., Quinlan, A. R., Stewart, C., Stromberg, M. P., Ward, A. N., Wu, J., Lee, C., Mills, R. E., Shi, X., McCarroll, S. A., Banks, E., DePristo, M. A., Handsaker, R. E., Hartl, C., Korn, J. M., Li, H., Nemesh, J. C., Sebat, J., Makarov, V., Ye, K., Yoon, S. C., Degenhardt, J., Kaganovich, M., Clarke, L., Smith, R. E., Zheng-Bradley, X., Korbel, J. O., Humphray, S., Cheetham, R. K., Eberle, M., Kahn, S., Murray, L., Ye, K., De La Vega, F. M., Fu, Y., Peckham, H. E., Sun, Y. A., Batzer, M. A., Konkel, M. K., Xiao, C., Iqbal, Z., Desany, B., Blackwell, T., Snyder, M., Xing, J., Eichler, E. E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J. M., Chen, K., Chinwalla, A., Ding, L., McLellan, M. D., Wallis, J. W., Hurles, M. E., Conrad, D. F., Walter, K., Zhang, Y., Gerstein, M. B., Snyder, M., Abyzov, A., Du, J., Grubert, F., Haraksingh, R., Jee, J., Khurana, E., Lam, H. Y., Leng, J., Mu, X. J., Urban, A. E., Zhang, Z., Gibbs, R. A., Bainbridge, M., Challis, D., Coafra, C., Dinh, H., Kovar, C., Lee, S., Muzny, D., Nazareth, L., Reid, J., Sabo, A., Yu, F., Yu, J., Marth, G. T., Garrison, E. P., Indap, A., Leong, W. F., Quinlan, A. R., Stewart, C., Ward, A. N., Wu, J., Cibulskis, K., Fennell, T. J., Gabriel, S. B., Garimella, K. V., Hartl, C., Shefler, E., Sougnez, C. L., Wilkinson, J., Clark, A. G., Gravel, S., Grubert, F., Clarke, L., Flicek, P., Smith, R. E., Zheng-Bradley, X., Sherry, S. T., Khouri, H. M., Paschall, J. E., Shumway, M. F., Xiao, C., McVean, G. A., Katzman, S. J., Abecasis, G. R., Blackwell, T., Mardis, E. R., Dooling, D., Fulton, L., Fulton, R., Koboldt, D. C., Durbin, R. M., Balasubramaniam, S., Coffey, A., Keane, T. M., MacArthur, D. G., Palotie, A., Scott, C., Stalker, J., Tyler-Smith, C., Gerstein, M. B., Balasubramanian, S., Chakravarti, A., Knoppers, B. M., Peltonen, L., Abecasis, G. R., Bustamante, C. D., Gharani, N., Gibbs, R. A., Jorde, L., Kaye, J. S., Kent, A., Li, T., McGuire, A. L., McVean, G. A., Ossorio, P. N., Rotimi, C. N., Su, Y., Toji, L. H., Tyler-Smith, C., Brooks, L. D., Felsenfeld, A. L., McEwen, J. E., Abdallah, A., Juenger, C. R., Clemm, N. C., Collins, F. S., Duncanson, A., Green, E. D., Guyer, M. S., Peterson, J. L., Schafer, A. J., Abecasis, G. R., Altshuler, D. L., Auton, A., Brooks, L. D., Durbin, R. M., Gibbs, R. A., Hurles, M. E., McVean, G. A. 2010; 467 (7319): 1061-1073

Abstract

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

View details for DOI 10.1038/nature09534

View details for Web of Science ID 000283548600039

View details for PubMedCentralID PMC3042601
Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies BIOINFORMATICS Yang, T., Beazley, C., Montgomery, S. B., Dimas, A. S., Gutierrez-Arcelus, M., Stranger, B. E., Deloukas, P., Dermitzakis, E. T. 2010; 26 (19): 2474-2476

Abstract

Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols.http://www.sanger.ac.uk/resources/software/genevar.

View details for DOI 10.1093/bioinformatics/btq452

View details for Web of Science ID 000282170000023

View details for PubMedID 20702402
Integrating common and rare genetic variation in diverse human populations NATURE Altshuler, D. M., Gibbs, R. A., Peltonen, L., Dermitzakis, E., Schaffner, S. F., Yu, F., Bonnen, P. E., de Bakker, P. I., Deloukas, P., Gabriel, S. B., Gwilliam, R., Hunt, S., Inouye, M., Jia, X., Palotie, A., Parkin, M., Whittaker, P., Chang, K., Hawes, A., Lewis, L. R., Ren, Y., Wheeler, D., Muzny, D. M., Barnes, C., Darvishi, K., Hurles, M., Korn, J. M., Kristiansson, K., Lee, C., McCarroll, S. A., Nemesh, J., Keinan, A., Montgomery, S. B., Pollack, S., Price, A. L., Soranzo, N., Gonzaga-Jauregui, C., Anttila, V., Brodeur, W., Daly, M. J., Leslie, S., McVean, G., Moutsianas, L., Nguyen, H., Zhang, Q., Ghori, M. J., McGinnis, R., McLaren, W., Takeuchi, F., Grossman, S. R., Shlyakhter, I., Hostetter, E. B., Sabeti, P. C., Adebamowo, C. A., Foster, M. W., Gordon, D. R., Licinio, J., Manca, M. C., Marshall, P. A., Matsuda, I., Ngare, D., Wang, V. O., Reddy, D., Rotimi, C. N., Royal, C. D., Sharp, R. R., Zeng, C., Brooks, L. D., McEwen, J. E. 2010; 467 (7311): 52-58

Abstract

Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of
View details for DOI 10.1038/nature09298

View details for Web of Science ID 000281461200033

View details for PubMedID 20811451
Transcriptome genetics using second generation sequencing in a Caucasian population NATURE Montgomery, S. B., Sammeth, M., Gutierrez-Arcelus, M., Lach, R. P., Ingle, C., Nisbett, J., Guigo, R., Dermitzakis, E. T. 2010; 464 (7289): 773-U151

Abstract

Gene expression is an important phenotype that informs about genetic and environmental effects on cellular state. Many studies have previously identified genetic variants for gene expression phenotypes using custom and commercially available microarrays. Second generation sequencing technologies are now providing unprecedented access to the fine structure of the transcriptome. We have sequenced the mRNA fraction of the transcriptome in 60 extended HapMap individuals of European descent and have combined these data with genetic variants from the HapMap3 project. We have quantified exon abundance based on read depth and have also developed methods to quantify whole transcript abundance. We have found that approximately 10 million reads of sequencing can provide access to the same dynamic range as arrays with better quantification of alternative and highly abundant transcripts. Correlation with SNPs (small nucleotide polymorphisms) leads to a larger discovery of eQTLs (expression quantitative trait loci) than with arrays. We also detect a substantial number of variants that influence the structure of mature transcripts indicating variants responsible for alternative splicing. Finally, measures of allele-specific expression allowed the identification of rare eQTLs and allelic differences in transcript structure. This analysis shows that high throughput sequencing technologies reveal new properties of genetic effects on the transcriptome and allow the exploration of genetic effects in cellular processes.

View details for DOI 10.1038/nature08903

View details for Web of Science ID 000276205000048

View details for PubMedID 20220756
Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations PLOS GENETICS Nica, A. C., Montgomery, S. B., Dimas, A. S., Stranger, B. E., Beazley, C., Barroso, I., Dermitzakis, E. T. 2010; 6 (4)

Abstract

The recent success of genome-wide association studies (GWAS) is now followed by the challenge to determine how the reported susceptibility variants mediate complex traits and diseases. Expression quantitative trait loci (eQTLs) have been implicated in disease associations through overlaps between eQTLs and GWAS signals. However, the abundance of eQTLs and the strong correlation structure (LD) in the genome make it likely that some of these overlaps are coincidental and not driven by the same functional variants. In the present study, we propose an empirical methodology, which we call Regulatory Trait Concordance (RTC) that accounts for local LD structure and integrates eQTLs and GWAS results in order to reveal the subset of association signals that are due to cis eQTLs. We simulate genomic regions of various LD patterns with both a single or two causal variants and show that our score outperforms SNP correlation metrics, be they statistical (r(2)) or historical (D'). Following the observation of a significant abundance of regulatory signals among currently published GWAS loci, we apply our method with the goal to prioritize relevant genes for each of the respective complex traits. We detect several potential disease-causing regulatory effects, with a strong enrichment for immunity-related conditions, consistent with the nature of the cell line tested (LCLs). Furthermore, we present an extension of the method in trans, where interrogating the whole genome for downstream effects of the disease variant can be informative regarding its unknown primary biological effect. We conclude that integrating cellular phenotype associations with organismal complex traits will facilitate the biological interpretation of the genetic effects on these traits.

View details for DOI 10.1371/journal.pgen.1000895

View details for Web of Science ID 000277354200012

View details for PubMedID 20369022
Out of the sequencer and into the wiki as we face new challenges in genome informatics. Genome biology Ning, Z., Montgomery, S. B. 2010; 11 (10): 308-?

Abstract

A report on the joint Cold Spring Harbor Laboratory/Wellcome Trust Conference 'Genome Informatics', 15-19 September 2010, Hinxton, Cambridge, UK.

View details for DOI 10.1186/gb-2010-11-10-308

View details for PubMedID 21067526
Annotating the regulatory genome. Methods in molecular biology (Clifton, N.J.) Montgomery, S. B., Kasaian, K., Jones, S. J., Griffith, O. L. 2010; 674: 313-349

Abstract

Determining the timing and molecular repertoire responsible for gene expression is fundamental to understanding a gene's function. Heritable differences in this character are increasingly regarded as explanatory for complex and common traits. For many known trait-predisposing genes, studies have sought to elucidate the associated logic behind gene regulation. However, there exist many challenges in deciphering these mechanisms. Among them, it is recognized that we have limited understanding of regulatory complexity, the current models of gene regulation have low specificity and any gene's regulatory logic is dependent on biological context. Addressing these limitations and defining the regulatory genome is an ongoing challenge for molecular biology. We discuss current efforts to define and annotate the regulatory genome by focusing on curation and text-mining activities. We further highlight the type of information and curation process for describing regulatory elements within the ORegAnno database ( www.oreganno.org ) and how the general standards for such information are changing.

View details for DOI 10.1007/978-1-60761-854-6_20

View details for PubMedID 20827601
The resolution of the genetics of gene expression HUMAN MOLECULAR GENETICS Montgomery, S. B., Dermitzakis, E. T. 2009; 18: R211-R215

Abstract

Understanding the influence of genetics on the molecular mechanisms underpinning human phenotypic diversity is fundamental to being able to predict health outcomes and treat disease. To interrogate the role of genetics on cellular state and function, gene expression has been extensively used. Past and present studies have highlighted important patterns of heritability, population differentiation and tissue-specificity in gene expression. Current and future studies are taking advantage of systems biology-based approaches and advances in sequencing technology: new methodology aims to translate regulatory networks to enrich pathways responsible for disease etiology and 2nd generation sequencing now offers single-molecular resolution of the transcriptome providing unprecedented information on the structural and genetic characteristics of gene expression. Such advances are leading to a future where rich cellular phenotypes will facilitate understanding of the transmission of genetic effect from the gene to organism.

View details for DOI 10.1093/hmg/ddp400

View details for Web of Science ID 000271265600012

View details for PubMedID 19808798
Common Regulatory Variation Impacts Gene Expression in a Cell Type-Dependent Manner SCIENCE Dimas, A. S., Deutsch, S., Stranger, B. E., Montgomery, S. B., Borel, C., Attar-Cohen, H., Ingle, C., Beazley, C., Arcelus, M. G., Sekowska, M., Gagnebin, M., Nisbett, J., Deloukas, P., Dermitzakis, E. T., Antonarakis, S. E. 2009; 325 (5945): 1246-1250

Abstract

Studies correlating genetic variation to gene expression facilitate the interpretation of common human phenotypes and disease. As functional variants may be operating in a tissue-dependent manner, we performed gene expression profiling and association with genetic variants (single-nucleotide polymorphisms) on three cell types of 75 individuals. We detected cell type-specific genetic effects, with 69 to 80% of regulatory variants operating in a cell type-specific manner, and identified multiple expressive quantitative trait loci (eQTLs) per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene. Cell type-specific eQTLs were found at larger distances from genes and at lower effect size, similar to known enhancers. These data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell-type specificity.

View details for DOI 10.1126/science.1174148

View details for Web of Science ID 000269523200038

View details for PubMedID 19644074
Is the thrifty genotype hypothesis supported by evidence based on confirmed type 2 diabetes- and obesity-susceptibility variants? DIABETOLOGIA Southam, L., Soranzo, N., Montgomery, S. B., Frayling, T. M., McCarthy, M. I., Barroso, I., Zeggini, E. 2009; 52 (9): 1846-1851

Abstract

According to the thrifty genotype hypothesis, the high prevalence of type 2 diabetes and obesity is a consequence of genetic variants that have undergone positive selection during historical periods of erratic food supply. The recent expansion in the number of validated type 2 diabetes- and obesity-susceptibility loci, coupled with access to empirical data, enables us to look for evidence in support (or otherwise) of the thrifty genotype hypothesis using proven loci.We employed a range of tests to obtain complementary views of the evidence for selection: we determined whether the risk allele at associated 'index' single-nucleotide polymorphisms is derived or ancestral, calculated the integrated haplotype score (iHS) and assessed the population differentiation statistic fixation index (F (ST)) for 17 type 2 diabetes and 13 obesity loci.We found no evidence for significant differences for the derived/ancestral allele test. None of the studied loci showed strong evidence for selection based on the iHS score. We find a high F (ST) for rs7901695 at TCF7L2, the largest type 2 diabetes effect size found to date.Our results provide some evidence for selection at specific loci, but there are no consistent patterns of selection that provide conclusive confirmation of the thrifty genotype hypothesis. Discovery of more signals and more causal variants for type 2 diabetes and obesity is likely to allow more detailed examination of these issues.

View details for DOI 10.1007/s00125-009-1419-3

View details for Web of Science ID 000268776100018

View details for PubMedID 19526209
Current computational methods for prioritizing candidate regulatory polymorphisms. Methods in molecular biology (Clifton, N.J.) Montgomery, S. 2009; 569: 89-114

Abstract

Discovery of DNA sequence variants responsible for human phenotypic variation is key to advances in molecular diagnostics and medicines. Historically, variants that alter the protein-coding sequence of genes have been targeted when attempting to identify a trait's etiology; this is done because the rules governing these regions are generally well-understood and candidate variants can be easily selected. However, the effects of variants on gene regulation are increasingly regarded as being as important as protein-coding variation in uncovering the nature of phenotypic variation. I discuss resources and methodology that have recently been developed to computationally prioritize variants that may alter gene expression.

View details for DOI 10.1007/978-1-59745-524-4_5

View details for PubMedID 19623487
ORegAnno: an open-access community-driven resource for regulatory annotation NUCLEIC ACIDS RESEARCH Griffith, O. L., Montgomery, S. B., Bernier, B., Chu, B., Kasaian, K., Aerts, S., Mahony, S., Sleumer, M. C., Bilenky, M., Haeussler, M., Griffith, M., Gallo, S. M., Giardine, B., Hooghe, B., Van Loo, P., Blanco, E., Ticoll, A., Lithwick, S., Portales-Casamar, E., Donaldson, I. J., Robertson, G., Wadelius, C., De Bleser, P., Vlieghe, D., Halfon, M. S., Wasserman, W., Hardison, R., Bergman, C. M., Jones, S. J. 2008; 36: D107-D113

Abstract

ORegAnno is an open-source, open-access database and literature curation system for community-based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the 'publication queue' allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54 351 identified by text-mining methods. Users can enter or 'check out' papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at: http://www.oreganno.org.

View details for DOI 10.1093/nar/gkm967

View details for Web of Science ID 000252545400020

View details for PubMedID 18006570
Text-mining assisted regulatory annotation GENOME BIOLOGY Aerts, S., Haeussler, M., Van Vooren, S., Griffith, O. L., Hulpiau, P., Jones, S. J., Montgomery, S. B., Bergman, C. M. 2008; 9 (2)

Abstract

Decoding transcriptional regulatory networks and the genomic cis-regulatory logic implemented in their control nodes is a fundamental challenge in genome biology. High-throughput computational and experimental analyses of regulatory networks and sequences rely heavily on positive control data from prior small-scale experiments, but the vast majority of previously discovered regulatory data remains locked in the biomedical literature.We develop text-mining strategies to identify relevant publications and extract sequence information to assist the regulatory annotation process. Using a vector space model to identify Medline abstracts from papers likely to have high cis-regulatory content, we demonstrate that document relevance ranking can assist the curation of transcriptional regulatory networks and estimate that, minimally, 30,000 papers harbor unannotated cis-regulatory data. In addition, we show that DNA sequences can be extracted from primary text with high cis-regulatory content and mapped to genome sequences as a means of identifying the location, organism and target gene information that is critical to the cis-regulatory annotation process.Our results demonstrate that text-mining technologies can be successfully integrated with genome annotation systems, thereby increasing the availability of annotated cis-regulatory data needed to catalyze advances in the field of gene regulation.

View details for DOI 10.1186/gb-2008-9-2-r31

View details for Web of Science ID 000254659300013

View details for PubMedID 18271954
Population genomics of human gene expression NATURE GENETICS Stranger, B. E., Nica, A. C., Forrest, M. S., Dimas, A., Bird, C. P., Beazley, C., Ingle, C. E., Dunning, M., Flicek, P., Koller, D., Montgomery, S., Tavare, S., Deloukas, P., Dermitzakis, E. T. 2007; 39 (10): 1217-1224

Abstract

Genetic variation influences gene expression, and this variation in gene expression can be efficiently mapped to specific genomic regions and variants. Here we have used gene expression profiling of Epstein-Barr virus-transformed lymphoblastoid cell lines of all 270 individuals genotyped in the HapMap Consortium to elucidate the detailed features of genetic variation underlying gene expression variation. We find that gene expression is heritable and that differentiation between populations is in agreement with earlier small-scale studies. A detailed association analysis of over 2.2 million common SNPs per population (5% frequency in HapMap) with gene expression identified at least 1,348 genes with association signals in cis and at least 180 in trans. Replication in at least one independent population was achieved for 37% of cis signals and 15% of trans signals, respectively. Our results strongly support an abundance of cis-regulatory variation in the human genome. Detection of trans effects is limited but suggests that regulatory variation may be the key primary effect contributing to phenotypic variation in humans. We also explore several methodologies that improve the current state of analysis of gene expression variation.

View details for DOI 10.1038/ng2142

View details for Web of Science ID 000249737400017

View details for PubMedID 17873874

View details for PubMedCentralID PMC2683249
A survey of genomic properties for the detection of regulatory polymorphisms PLOS COMPUTATIONAL BIOLOGY Montgomery, S. B., Griffith, O. L., Schuetz, J. M., Brooks-Wilson, A., Jones, S. J. 2007; 3 (6): 1000-1010

Abstract

Advances in the computational identification of functional noncoding polymorphisms will aid in cataloging novel determinants of health and identifying genetic variants that explain human evolution. To date, however, the development and evaluation of such techniques has been limited by the availability of known regulatory polymorphisms. We have attempted to address this by assembling, from the literature, a computationally tractable set of regulatory polymorphisms within the ORegAnno database (http://www.oreganno.org). We have further used 104 regulatory single-nucleotide polymorphisms from this set and 951 polymorphisms of unknown function, from 2-kb and 152-bp noncoding upstream regions of genes, to investigate the discriminatory potential of 23 properties related to gene regulation and population genetics. Among the most important properties detected in this region are distance to transcription start site, local repetitive content, sequence conservation, minor and derived allele frequencies, and presence of a CpG island. We further used the entire set of properties to evaluate their collective performance in detecting regulatory polymorphisms. Using a 10-fold cross-validation approach, we were able to achieve a sensitivity and specificity of 0.82 and 0.71, respectively, and we show that this performance is strongly influenced by the distance to the transcription start site.

View details for DOI 10.1371/journal.pcbi.0030106

View details for Web of Science ID 000249105500010

View details for PubMedID 17559298
ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation BIOINFORMATICS Montgomery, S. B., Griffith, O. L., Sleumer, M. C., Bergman, C. M., Bilenky, M., Pleasance, E. D., Prychyna, Y., Zhang, X., Jones, S. J. 2006; 22 (5): 637-640

Abstract

Our understanding of gene regulation is currently limited by our ability to collectively synthesize and catalogue transcriptional regulatory elements stored in scientific literature. Over the past decade, this task has become increasingly challenging as the accrual of biologically validated regulatory sequences has accelerated. To meet this challenge, novel community-based approaches to regulatory element annotation are required.Here, we present the Open Regulatory Annotation (ORegAnno) database as a dynamic collection of literature-curated regulatory regions, transcription factor binding sites and regulatory mutations (polymorphisms and haplotypes). ORegAnno has been designed to manage the submission, indexing and validation of new annotations from users worldwide. Submissions to ORegAnno are immediately cross-referenced to EnsEMBL, dbSNP, Entrez Gene, the NCBI Taxonomy database and PubMed, where appropriate.ORegAnno is available directly through MySQL, Web services, and online at http://www.oreganno.org. All software is licensed under the Lesser GNU Public License (LGPL).

View details for DOI 10.1093/bioinformatics/btk027

View details for Web of Science ID 000235604400024

View details for PubMedID 16397004
cisRED: a database system for genome-scale computational discovery of regulatory elements NUCLEIC ACIDS RESEARCH Robertson, G., Bilenky, M., Lin, K., He, A., Yuen, W., Dagpinar, M., Varhol, R., Teague, K., Griffith, O. L., Zhang, X., Pan, Y., Hassel, M., Sleumer, M. C., Pan, W., Pleasance, E. D., Chuang, M., Hao, H., Li, Y. Y., Robertson, N., Fjell, C., Li, B., Montgomery, S. B., Astakhova, T., Zhou, J., Sander, J., Siddiqui, A. S., Jones, S. J. 2006; 34: D68-D73

Abstract

We describe cisRED, a database for conserved regulatory elements that are identified and ranked by a genome-scale computational system (www.cisred.org). The database and high-throughput predictive pipeline are designed to address diverse target genomes in the context of rapidly evolving data resources and tools. Motifs are predicted in promoter regions using multiple discovery methods applied to sequence sets that include corresponding sequence regions from vertebrates. We estimate motif significance by applying discovery and post-processing methods to randomized sequence sets that are adaptively derived from target sequence sets, retain motifs with p-values below a threshold and identify groups of similar motifs and co-occurring motif patterns. The database offers information on atomic motifs, motif groups and patterns. It is web-accessible, and can be queried directly, downloaded or installed locally.

View details for DOI 10.1093/nar/gkj075

View details for Web of Science ID 000239307700015

View details for PubMedID 16381958
An application of peer-to-peer technology to the discovery, use and assessment of bioinformatics programs NATURE METHODS Montgomery, S. B., Fu, T., Guan, J., Lin, K., Jones, S. J. 2005; 2 (8): 563-563

View details for Web of Science ID 000230884500002

View details for PubMedID 16094378
Sockeye: A 3D environment for comparative genomics GENOME RESEARCH Montgomery, S. B., Astakhova, T., Bilenky, M., Birney, E., Fu, T., Hassel, M., Melsopp, C., Rak, M., Robertson, A. G., Sleumer, M., Siddiqui, A. S., Jones, S. J. 2004; 14 (5): 956-962

Abstract

Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization.

View details for DOI 10.1101/gr.1890304

View details for Web of Science ID 000221171700022

View details for PubMedID 15123592
The genome sequence of the SARS-associated coronavirus SCIENCE Marra, M. A., Jones, S. J., Astell, C. R., Holt, R. A., Brooks-Wilson, A., Butterfield, Y. S., Khattra, J., Asano, J. K., Barber, S. A., Chan, S. Y., Cloutier, A., Coughlin, S. M., Freeman, D., Girn, N., Griffith, O. L., Leach, S. R., Mayo, M., MCDONALD, H., Montgomery, S. B., Pandoh, P. K., Petrescu, A. S., Robertson, A. G., Schein, J. E., Siddiqui, A., Smailus, D. E., Stott, J. E., Yang, G. S., Plummer, F., Andonov, A., Artsob, H., Bastien, N., Bernard, K., Booth, T. F., Bowness, D., Czub, M., Drebot, M., Fernando, L., Flick, R., Garbutt, M., Gray, M., Grolla, A., Jones, S., Feldmann, H., Meyers, A., Kabani, A., Li, Y., Normand, S., Stroher, U., Tipples, G. A., Tyler, S., Vogrig, R., Ward, D., Watson, B., BRUNHAM, R. C., Krajden, M., Petric, M., Skowronski, D. M., Upton, C., Roper, R. L. 2003; 300 (5624): 1399-1404

Abstract

We sequenced the 29,751-base genome of the severe acute respiratory syndrome (SARS)-associated coronavirus known as the Tor2 isolate. The genome sequence reveals that this coronavirus is only moderately related to other known coronaviruses, including two human coronaviruses, HCoV-OC43 and HCoV-229E. Phylogenetic analysis of the predicted viral proteins indicates that the virus does not closely resemble any of the three previously known groups of coronaviruses. The genome sequence will aid in the diagnosis of SARS virus infection in humans and potential animal hosts (using polymerase chain reaction and immunological tests), in the development of antivirals (including neutralizing antibodies), and in the identification of putative epitopes for vaccine development.

View details for DOI 10.1126/science.1085953

View details for Web of Science ID 000183181800036

View details for PubMedID 12730501

Stephen B. Montgomery

Stanford Medicine Professor of Pathology, Professor of Genetics and of Biomedical Data Science and, by courtesy, of Computer Science

Bio

Academic Appointments

Administrative Appointments

Professional Education

Contact

Links

Current Research and Scholarly Interests

2025-26 Courses

2023-24 Courses

2022-23 Courses

Stanford Advisees

Graduate and Fellowship Programs

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract