Academic Appointments

Administrative Appointments

  • Director of Genome Informatics, Department of Pathology (2011 - Present)

Professional Education

  • B.A.Sc., University of British Columbia, Engineering Physics (2002)
  • Ph.D., University of British Columbia, Genetics (2006)

Current Research and Scholarly Interests

We focus on understanding the effects of genome variation on cellular phenotypes and cellular modeling of disease through genomic approaches such as next generation RNA sequencing in combination with developing and utilizing state-of-the-art bioinformatics and statistical genetics approaches. See our website at

2016-17 Courses

Stanford Advisees

Graduate and Fellowship Programs

All Publications

  • Small RNA Sequencing in Cells and Exosomes Identifies eQTLs and 14q32 as a Region of Active Export G3-GENES GENOMES GENETICS Tsang, E. K., Abell, N. S., Li, X., Anaya, V., Karczewski, K. J., Knowles, D. A., Sierra, R. G., Smith, K. S., Montgomery, S. B. 2017; 7 (1): 31-39
  • Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells. Nature methods Hess, G. T., Frésard, L., Han, K., Lee, C. H., Li, A., Cimprich, K. A., Montgomery, S. B., Bassik, M. C. 2016


    Engineering and study of protein function by directed evolution has been limited by the technical requirement to use global mutagenesis or introduce DNA libraries. Here, we develop CRISPR-X, a strategy to repurpose the somatic hypermutation machinery for protein engineering in situ. Using catalytically inactive dCas9 to recruit variants of cytidine deaminase (AID) with MS2-modified sgRNAs, we can specifically mutagenize endogenous targets with limited off-target damage. This generates diverse libraries of localized point mutations and can target multiple genomic locations simultaneously. We mutagenize GFP and select for spectrum-shifted variants, including EGFP. Additionally, we mutate the target of the cancer therapeutic bortezomib, PSMB5, and identify known and novel mutations that confer bortezomib resistance. Finally, using a hyperactive AID variant, we mutagenize loci both upstream and downstream of transcriptional start sites. These experiments illustrate a powerful approach to create complex libraries of genetic variants in native context, which is broadly applicable to investigate and improve protein function.

    View details for DOI 10.1038/nmeth.4038

    View details for PubMedID 27798611

  • Small RNA Sequencing in Cells and Exosomes Identifies eQTLs and 14q32 as a Region of Active Export. G3 (Bethesda, Md.) Tsang, E. K., Abell, N. S., Li, X., Anaya, V., Karczewski, K. J., Knowles, D. A., Sierra, R. G., Smith, K. S., Montgomery, S. B. 2016


    Exosomes are small extracellular vesicles that carry heterogeneous cargo, including RNA, between cells. Increasing evidence suggests that exosomes are important mediators of intercellular communication and biomarkers of disease. Despite this, the variability of exosomal RNA between individuals has not been well quantified. To assess this variability, we sequenced the small RNA of cells and exosomes from a 17-member family. Across individuals, we show that selective export of miRNAs occurs not only at the level of specific transcripts, but that a cluster of 74 mature miRNAs on chromosome 14q32 is massively exported in exosomes while mostly absent from cells. We also observe more interindividual variability between exosomal samples than between cellular ones and identify four miRNA expression quantitative trait loci shared between cells and exosomes. Our findings indicate that genomically colocated miRNAs can be exported together and highlight the variability in exosomal miRNA levels between individuals as relevant for exosome use as diagnostics.

    View details for DOI 10.1534/g3.116.036137

    View details for PubMedID 27799337

  • DNA Methylation Profiling of Uniparental Disomy Subjects Provides a Map of Parental Epigenetic Bias in the Human Genome. American journal of human genetics Joshi, R. S., Garg, P., Zaitlen, N., Lappalainen, T., Watson, C. T., Azam, N., Ho, D., Li, X., Antonarakis, S. E., Brunner, H. G., Buiting, K., Cheung, S. W., Coffee, B., Eggermann, T., Francis, D., Geraedts, J. P., Gimelli, G., Jacobson, S. G., Le Caignec, C., de Leeuw, N., Liehr, T., Mackay, D. J., Montgomery, S. B., Pagnamenta, A. T., Papenhausen, P., Robinson, D. O., Ruivenkamp, C., Schwartz, C., Steiner, B., Stevenson, D. A., Surti, U., Wassink, T., Sharp, A. J. 2016; 99 (3): 555-566


    Genomic imprinting is a mechanism in which gene expression varies depending on parental origin. Imprinting occurs through differential epigenetic marks on the two parental alleles, with most imprinted loci marked by the presence of differentially methylated regions (DMRs). To identify sites of parental epigenetic bias, here we have profiled DNA methylation patterns in a cohort of 57 individuals with uniparental disomy (UPD) for 19 different chromosomes, defining imprinted DMRs as sites where the maternal and paternal methylation levels diverge significantly from the biparental mean. Using this approach we identified 77 DMRs, including nearly all those described in previous studies, in addition to 34 DMRs not previously reported. These include a DMR at TUBGCP5 within the recurrent 15q11.2 microdeletion region, suggesting potential parent-of-origin effects associated with this genomic disorder. We also observed a modest parental bias in DNA methylation levels at every CpG analyzed across ∼1.9 Mb of the 15q11-q13 Prader-Willi/Angelman syndrome region, demonstrating that the influence of imprinting is not limited to individual regulatory elements such as CpG islands, but can extend across entire chromosomal domains. Using RNA-seq data, we detected signatures consistent with imprinted expression associated with nine novel DMRs. Finally, using a population sample of 4,004 blood methylomes, we define patterns of epigenetic variation at DMRs, identifying rare individuals with global gain or loss of methylation across multiple imprinted loci. Our data provide a detailed map of parental epigenetic bias in the human genome, providing insights into potential parent-of-origin effects.

    View details for DOI 10.1016/j.ajhg.2016.06.032

    View details for PubMedID 27569549

  • Impact of the X Chromosome and sex on regulatory variation GENOME RESEARCH Kukurba, K. R., Parsana, P., Balliu, B., Smith, K. S., Zappala, Z., Knowles, D. A., Fave, M., Davis, J. R., Li, X., Zhu, X., Potash, J. B., Weissman, M. M., Shi, J., Kundaje, A., Levinson, D. F., Awadalla, P., Mostafavi, S., Battle, A., Montgomery, S. B. 2016; 26 (6): 768-777


    The X Chromosome, with its unique mode of inheritance, contributes to differences between the sexes at a molecular level, including sex-specific gene expression and sex-specific impact of genetic variation. Improving our understanding of these differences offers to elucidate the molecular mechanisms underlying sex-specific traits and diseases. However, to date, most studies have either ignored the X Chromosome or had insufficient power to test for the sex-specific impact of genetic variation. By analyzing whole blood transcriptomes of 922 individuals, we have conducted the first large-scale, genome-wide analysis of the impact of both sex and genetic variation on patterns of gene expression, including comparison between the X Chromosome and autosomes. We identified a depletion of expression quantitative trait loci (eQTL) on the X Chromosome, especially among genes under high selective constraint. In contrast, we discovered an enrichment of sex-specific regulatory variants on the X Chromosome. To resolve the molecular mechanisms underlying such effects, we generated chromatin accessibility data through ATAC-sequencing to connect sex-specific chromatin accessibility to sex-specific patterns of expression and regulatory variation. As sex-specific regulatory variants discovered in our study can inform sex differences in heritable disease prevalence, we integrated our data with genome-wide association study data for multiple immune traits identifying several traits with significant sex biases in genetic susceptibilities. Together, our study provides genome-wide insight into how genetic variation, the X Chromosome, and sex shape human gene regulation and disease.

    View details for DOI 10.1101/gr.197897.115

    View details for Web of Science ID 000377090400005

    View details for PubMedID 27197214

  • An Efficient Multiple-Testing Adjustment for eQTL Studies that Accounts for Linkage Disequilibrium between Variants AMERICAN JOURNAL OF HUMAN GENETICS Davis, J. R., Fresard, L., Knowles, D. A., Pala, M., Bustamante, C. D., Battle, A., Montgomery, S. B. 2016; 98 (1): 216-224
  • ORegAnno 3.0: a community-driven resource for curated regulatory annotation. Nucleic acids research Lesurf, R., Cotto, K. C., Wang, G., Griffith, M., Kasaian, K., Jones, S. J., Montgomery, S. B., Griffith, O. L. 2016; 44 (D1): D126-32


    The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at

    View details for DOI 10.1093/nar/gkv1203

    View details for PubMedID 26578589

  • Integrative functional genomics identifies regulatory mechanisms at coronary artery disease loci. Nature communications Miller, C. L., Pjanic, M., Wang, T., Nguyen, T., Cohain, A., Lee, J. D., Perisic, L., Hedin, U., Kundu, R. K., Majmudar, D., Kim, J. B., Wang, O., Betsholtz, C., Ruusalepp, A., Franzén, O., Assimes, T. L., Montgomery, S. B., Schadt, E. E., Björkegren, J. L., Quertermous, T. 2016; 7: 12092-?


    Coronary artery disease (CAD) is the leading cause of mortality and morbidity, driven by both genetic and environmental risk factors. Meta-analyses of genome-wide association studies have identified >150 loci associated with CAD and myocardial infarction susceptibility in humans. A majority of these variants reside in non-coding regions and are co-inherited with hundreds of candidate regulatory variants, presenting a challenge to elucidate their functions. Herein, we use integrative genomic, epigenomic and transcriptomic profiling of perturbed human coronary artery smooth muscle cells and tissues to begin to identify causal regulatory variation and mechanisms responsible for CAD associations. Using these genome-wide maps, we prioritize 64 candidate variants and perform allele-specific binding and expression analyses at seven top candidate loci: 9p21.3, SMAD3, PDGFD, IL6R, BMP1, CCDC97/TGFB1 and LMOD1. We validate our findings in expression quantitative trait loci cohorts, which together reveal new links between CAD associations and regulatory function in the appropriate disease context.

    View details for DOI 10.1038/ncomms12092

    View details for PubMedID 27386823

  • A global reference for human genetic variation NATURE Altshuler, D. M., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., Donnelly, P., Eichler, E. E., Flicek, P., Gabriel, S. B., Gibbs, R. A., Green, E. D., Hurles, M. E., Knoppers, B. M., Korbel, J. O., Lander, E. S., Lee, C., Lehrach, H., Mardis, E. R., Marth, G. T., McVean, G. A., Nickerson, D. A., Schmidt, J. P., Sherry, S. T., Wang, J., Wilson, R. K., Gibbs, R. A., Boerwinkle, E., Doddapaneni, H., Han, Y., Korchina, V., Kovar, C., Lee, S., Muzny, D., Reid, J. G., Zhu, Y., Wang, J., Chang, Y., Feng, Q., Fang, X., Guo, X., Jian, M., Jiang, H., Jin, X., Lan, T., Li, G., Li, J., Li, Y., Liu, S., Liu, X., Lu, Y., Ma, X., Tang, M., Wang, B., Wang, G., Wu, H., Wu, R., Xu, X., Yin, Y., Zhang, D., Zhang, W., Zhao, J., Zhao, M., Zheng, X., Lander, E. S., Altshuler, D. M., Gabriel, S. B., Gupta, N., Gharani, N., Toji, L. H., Gerry, N. P., Resch, A. M., Flicek, P., Barker, J., Clarke, L., Gil, L., Hunt, S. E., Kelman, G., Kulesha, E., Leinonen, R., McLaren, W. M., Radhakrishnan, R., Roa, A., Smirnov, D., Smith, R. E., Streeter, I., Thormann, A., Toneva, I., Vaughan, B., Zheng-Bradley, X., Bentley, D. R., Grocock, R., Humphray, S., James, T., Kingsbury, Z., Lehrach, H., Sudbrak, R., Albrecht, M. W., Amstislavskiy, V. S., Borodina, T. A., Lienhard, M., Mertes, F., Sultan, M., Timmermann, B., Yaspo, M., Mardis, E. R., Wilson, R. K., Fulton, L., Fulton, R., Sherry, S. T., Ananiev, V., Belaia, Z., Beloslyudtsev, D., Bouk, N., Chen, C., Church, D., Cohen, R., Cook, C., Garner, J., Hefferon, T., Kimelman, M., Liu, C., Lopez, J., Meric, P., O'Sullivan, C., Ostapchuk, Y., Phan, L., Ponomarov, S., Schneider, V., Shekhtman, E., Sirotkin, K., Slotta, D., Zhang, H., McVean, G. A., Durbin, R. M., Balasubramaniam, S., Burton, J., Danecek, P., Keane, T. M., Kolb-Kokocinski, A., McCarthy, S., Stalker, J., Quail, M., Schmidt, J. P., Davies, C. J., Gollub, J., Webster, T., Wong, B., Zhan, Y., Auton, A., Campbell, C. L., Kong, Y., Marcketta, A., Gibbs, R. A., Yu, F., Antunes, L., Bainbridge, M., Muzny, D., Sabo, A., Huang, Z., Wang, J., Coin, L. J., Fang, L., Guo, X., Jin, X., Li, G., Li, Q., Li, Y., Li, Z., Lin, H., Liu, B., Luo, R., Shao, H., Xie, Y., Ye, C., Yu, C., Zhang, F., Zheng, H., Zhu, H., Alkan, C., Dal, E., Kahveci, F., Marth, G. T., Garrison, E. P., Kural, D., Lee, W., Leong, W. F., Stromberg, M., Ward, A. N., Wu, J., Zhang, M., Daly, M. J., DePristo, M. A., Handsaker, R. E., Altshuler, D. M., Banks, E., Bhatia, G., del Angel, G., Gabriel, S. B., Genovese, G., Gupta, N., Li, H., Kashin, S., Lander, E. S., McCarroll, S. A., Nemesh, J. C., Poplin, R. E., Yoon, S. C., Lihm, J., Makarov, V., Clark, A. G., Gottipati, S., Keinan, A., Rodriguez-Flores, J. L., Korbel, J. O., Rausch, T., Fritz, M. H., Stuetz, A. M., Flicek, P., Beal, K., Clarke, L., Datta, A., Herrero, J., McLaren, W. M., Ritchie, G. R., Smith, R. E., Zerbino, D., Zheng-Bradley, X., Sabeti, P. C., Shlyakhter, I., Schaffner, S. F., Vitti, J., Cooper, D. N., Ball, E. V., Stenson, P. D., Bentley, D. R., Barnes, B., Bauer, M., Cheetham, R. K., Cox, A., Eberle, M., Humphray, S., Kahn, S., Murray, L., Peden, J., Shaw, R., Kenny, E. E., Batzer, M. A., Konkel, M. K., Walker, J. A., MacArthur, D. G., Lek, M., Sudbrak, R., Amstislavskiy, V. S., Herwig, R., Mardis, E. R., Ding, L., Koboldt, D. C., Larson, D., Ye, K., Gravel, S., Swaroop, A., Chew, E., Lappalainen, T., Erlich, Y., Gymrek, M., Willems, T. F., Simpson, J. T., Shriver, M. D., Rosenfeld, J. A., Bustamante, C. D., Montgomery, S. B., De La Vega, F. M., Byrnes, J. K., Carroll, A. W., DeGorter, M. K., Lacroute, P., Maples, B. K., Martin, A. R., Moreno-Estrada, A., Shringarpure, S. S., Zakharia, F., Halperin, E., Baran, Y., Lee, C., Cerveira, E., Hwang, J., Malhotra, A., Plewczynski, D., Radew, K., Romanovitch, M., Zhang, C., Hyland, F. C., Craig, D. W., Christoforides, A., Homer, N., Izatt, T., Kurdoglu, A. A., Sinari, S. A., Squire, K., Sherry, S. T., Xiao, C., Sebat, J., Antaki, D., Gujral, M., Noor, A., Ye, K., Burchard, E. G., Hernandez, R. D., Gignoux, C. R., Haussler, D., Katzman, S. J., Kent, W. J., Howie, B., Ruiz-Linares, A., Dermitzakis, E. T., Devine, S. E., Goncalo, R. A., Kang, H. M., Kidd, J. M., Blackwell, T., Caron, S., Chen, W., Emery, S., Fritsche, L., Fuchsberger, C., Jun, G., Li, B., Lyons, R., Scheller, C., Sidore, C., Song, S., Sliwerska, E., Taliun, D., Tan, A., Welch, R., Wing, M. K., Zhan, X., Awadalla, P., Hodgkinson, A., Li, Y., Shi, X., Quitadamo, A., Lunter, G., McVean, G. A., Marchini, J. L., Myers, S., Churchhouse, C., Delaneau, O., Gupta-Hinch, A., Kretzschmar, W., Iqbal, Z., Mathieson, I., Menelaou, A., Rimmer, A., Xifara, D. K., Oleksyk, T. K., Fu, Y., Liu, X., Xiong, M., Jorde, L., Witherspoon, D., Xing, J., Eichler, E. E., Browning, B. L., Browning, S. R., Hormozdiari, F., Sudmant, P. H., Khurana, E., Durbin, R. M., Hurles, M. E., Tyler-Smith, C., Albers, C. A., Ayub, Q., Balasubramaniam, S., Chen, Y., Colonna, V., Danecek, P., Jostins, L., Keane, T. M., McCarthy, S., Walter, K., Xue, Y., Gerstein, M. B., Abyzov, A., Balasubramanian, S., Chen, J., Clarke, D., Fu, Y., Harmanci, A. O., Jin, M., Lee, D., Liu, J., Mu, X. J., Zhang, J., Zhang, Y., Li, Y., Luo, R., Zhu, H., Alkan, C., Dal, E., Kahveci, F., Marth, G. T., Garrison, E. P., Kural, D., Lee, W., Ward, A. N., Wu, J., Zhang, M., McCarroll, S. A., Handsaker, R. E., Altshuler, D. M., Banks, E., del Angel, G., Genovese, G., Hartl, C., Li, H., Kashin, S., Nemesh, J. C., Shakir, K., Yoon, S. C., Lihm, J., Makarov, V., Degenhardt, J., Korbel, J. O., Fritz, M. H., Meiers, S., Raeder, B., Rausch, T., Stuetz, A. M., Flicek, P., Casale, F. P., Clarke, L., Smith, R. E., Stegle, O., Zheng-Bradley, X., Bentley, D. R., Barnes, B., Cheetham, R. K., Eberle, M., Humphray, S., Kahn, S., Murray, L., Shaw, R., Lameijer, E., Batzer, M. A., Konkel, M. K., Walker, J. A., Ding, L., Hall, I., Ye, K., Lacroute, P., Lee, C., Cerveira, E., Malhotra, A., Hwang, J., Plewczynski, D., Radew, K., Romanovitch, M., Zhang, C., Craig, D. W., Homer, N., Church, D., Xiao, C., Sebat, J., Antaki, D., Bafna, V., Michaelson, J., Ye, K., Devine, S. E., Gardner, E. J., Abecasis, G. R., Kidd, J. M., Mills, R. E., Dayama, G., Emery, S., Jun, G., Shi, X., Quitadamo, A., Lunter, G., McVean, G. A., Chen, K., Fan, X., Chong, Z., Chen, T., Witherspoon, D., Xing, J., Eichler, E. E., Chaisson, M. J., Hormozdiari, F., Huddleston, J., Malig, M., Nelson, B. J., Sudmant, P. H., Parrish, N. F., Khurana, E., Hurles, M. E., Blackburne, B., Lindsay, S. J., Ning, Z., Walter, K., Zhang, Y., Gerstein, M. B., Abyzov, A., Chen, J., Clarke, D., Lam, H., Mu, X. J., Sisu, C., Zhang, J., Zhang, Y., Gibbs, R. A., Yu, F., Bainbridge, M., Challis, D., Evani, U. S., Kovar, C., Lu, J., Muzny, D., Nagaswamy, U., Reid, J. G., Sabo, A., Yu, J., Guo, X., Li, W., Li, Y., Wu, R., Marth, G. T., Garrison, E. P., Leong, W. F., Ward, A. N., del Angel, G., DePristo, M. A., Gabriel, S. B., Gupta, N., Hartl, C., Poplin, R. E., Clark, A. G., Rodriguez-Flores, J. L., Flicek, P., Clarke, L., Smith, R. E., Zheng-Bradley, X., MacArthur, D. G., Mardis, E. R., Fulton, R., Koboldt, D. C., Gravel, S., Bustamante, C. D., Craig, D. W., Christoforides, A., Homer, N., Izatt, T., Sherry, S. T., Xiao, C., Dermitzakis, E. T., Abecasis, G. R., Kang, H. M., McVean, G. A., Gerstein, M. B., Balasubramanian, S., Habegger, L., Yu, H., Flicek, P., Clarke, L., Cunningham, F., Dunham, I., Zerbino, D., Zheng-Bradley, X., Lage, K., Jespersen, J. B., Horn, H., Montgomery, S. B., DeGorter, M. K., Khurana, E., Tyler-Smith, C., Chen, Y., Colonna, V., Xue, Y., Gerstein, M. B., Balasubramanian, S., Fu, Y., Kim, D., Auton, A., Marcketta, A., DeSalle, R., Narechania, A., Sayres, M. A., Garrison, E. P., Handsaker, R. E., Kashin, S., McCarroll, S. A., Rodriguez-Flores, J. L., Flicek, P., Clarke, L., Zheng-Bradley, X., Erlich, Y., Gymrek, M., Willems, T. F., Bustamante, C. D., Mendez, F. L., Poznik, G. D., Underhill, P. A., Lee, C., Cerveira, E., Malhotra, A., Romanovitch, M., Zhang, C., Abecasis, G. R., Coin, L., Shao, H., Mittelman, D., Tyler-Smith, C., Ayub, Q., Banerjee, R., Cerezo, M., Chen, Y., Fitzgerald, T., Louzada, S., Massaia, A., McCarthy, S., Ritchie, G. R., Xue, Y., Yang, F., Gibbs, R. A., Kovar, C., Kalra, D., Hale, W., Muzny, D., Reid, J. G., Wang, J., Dan, X., Guo, X., Li, G., Li, Y., Ye, C., Zheng, X., Altshuler, D. M., Flicek, P., Clarke, L., Zheng-Bradley, X., Bentley, D. R., Cox, A., Humphray, S., Kahn, S., Sudbrak, R., Albrecht, M. W., Lienhard, M., Larson, D., Craig, D. W., Izatt, T., Kurdoglu, A. A., Sherry, S. T., Xiao, C., Haussler, D., Abecasis, G. R., McVean, G. A., Durbin, R. M., Balasubramaniam, S., Keane, T. M., McCarthy, S., Stalker, J., Chakravarti, A., Knoppers, B. M., Abecasis, G. R., Barnes, K. C., Beiswanger, C., Burchard, E. G., Bustamante, C. D., Cai, H., Cao, H., Durbin, R. M., Gerry, N. P., Gharani, N., Gibbs, R. A., Gignoux, C. R., Gravel, S., Henn, B., Jones, D., Jorde, L., Kaye, J. S., Keinan, A., Kent, A., Kerasidou, A., Li, Y., Mathias, R., McVean, G. A., Moreno-Estrada, A., Ossorio, P. N., Parker, M., Resch, A. M., Rotimi, C. N., Royal, C. D., Sandoval, K., Su, Y., Sudbrak, R., Tian, Z., Tishkoff, S., Toji, L. H., Tyler-Smith, C., Via, M., Wang, Y., Yang, H., Yang, L., Zhu, J., Bodmer, W., Bedoya, G., Ruiz-Linares, A., Cai, Z., Gao, Y., Chu, J., Peltonen, L., Garcia-Montero, A., Orfao, A., Dutil, J., Martinez-Cruzado, J. C., Oleksyk, T. K., Barnes, K. C., Mathias, R. A., Hennis, A., Watson, H., McKenzie, C., Qadri, F., LaRocque, R., Sabeti, P. C., Zhu, J., Deng, X., Sabeti, P. C., Asogun, D., Folarin, O., Happi, C., Omoniwa, O., Stremlau, M., Tariyal, R., Jallow, M., Joof, F. S., Corrah, T., Rockett, K., Kwiatkowski, D., Kooner, J., Tran Tinh Hien, T. T., Dunstan, S. J., Nguyen Thuy Hang, N. T., Fonnie, R., Garry, R., Kanneh, L., Moses, L., Sabeti, P. C., Schieffelin, J., Grant, D. S., Gallo, C., Poletti, G., Saleheen, D., Rasheed, A., Brook, L. D., Felsenfeld, A., McEwen, J. E., Vaydylevich, Y., Green, E. D., Duncanson, A., Dunn, M., Schloss, J. A., Wang, J., Yang, H., Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., Korbel, J. O., Marchini, J. L., McCarthy, S., McVean, G. A., Abecasis, G. R. 2015; 526 (7571): 68-?
  • The landscape of genomic imprinting across diverse adult human tissues GENOME RESEARCH Baran, Y., Subramaniam, M., Biton, A., Tukiainen, T., Tsang, E. K., Rivas, M. A., Pirinen, M., Gutierrez-Arcelus, M., Smith, K. S., Kukurba, K. R., Zhang, R., Eng, C., Torgerson, D. G., Urbanek, C., Li, J. B., Rodriguez-Santana, J. R., Burchard, E. G., Seibold, M. A., MacArthur, D. G., Montgomery, S. B., Zaitlen, N. A., Lappalainen, T. 2015; 25 (7): 927-936


    Genomic imprinting is an important regulatory mechanism that silences one of the parental copies of a gene. To systematically characterize this phenomenon, we analyze tissue-specificity of imprinting from allelic expression data in 1582 primary tissue samples from 178 individuals from the Genotype Tissue Expression (GTEx) project. We characterize imprinting in 42 genes, including both novel and previously identified genes. Tissue-specificity of imprinting is widespread, and gender-specific effects are revealed in a small number of genes in muscle with stronger imprinting in males. IGF2 shows maternal expression in the brain instead of the canonical paternal expression elsewhere. Imprinting appears to have only a subtle impact on tissue-specific expression levels, with genes lacking a systematic expression difference between tissues with imprinted and biallelic expression. In summary, our systematic characterization of imprinting in adult tissues highlights variation in imprinting between genes, individuals, and tissues.

    View details for DOI 10.1101/gr.192278.115

    View details for Web of Science ID 000357356900001

    View details for PubMedID 25953952

  • Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science Rivas, M. A., Pirinen, M., Conrad, D. F., Lek, M., Tsang, E. K., Karczewski, K. J., Maller, J. B., Kukurba, K. R., DeLuca, D. S., Fromer, M., Ferreira, P. G., Smith, K. S., Zhang, R., Zhao, F., Banks, E., Poplin, R., Ruderfer, D. M., Purcell, S. M., Tukiainen, T., Minikel, E. V., Stenson, P. D., Cooper, D. N., Huang, K. H., Sullivan, T. J., Nedzel, J., Bustamante, C. D., Li, J. B., Daly, M. J., Guigo, R., Donnelly, P., Ardlie, K., Sammeth, M., Dermitzakis, E. T., McCarthy, M. I., Montgomery, S. B., Lappalainen, T., MacArthur, D. G. 2015; 348 (6235): 666-669


    Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.

    View details for DOI 10.1126/science.1261877

    View details for PubMedID 25954003

  • Effect of predicted protein-truncating genetic variants on the human transcriptome SCIENCE Rivas, M. A., Pirinen, M., Conrad, D. F., Lek, M., Tsang, E. K., Karczewski, K. J., Maller, J. B., Kukurba, K. R., DeLuca, D. S., Fromer, M., Ferreira, P. G., Smith, K. S., Zhang, R., Zhao, F., Banks, E., Poplin, R., Ruderfer, D. M., Purcell, S. M., Tukiainen, T., Minikel, E. V., Stenson, P. D., Cooper, D. N., Huang, K. H., Sullivan, T. J., Nedzel, J., Bustamante, C. D., Li, J. B., Daly, M. J., Guigo, R., Donnelly, P., Ardlie, K., Sammeth, M., Dermitzakis, E. T., McCarthy, M. I., Montgomery, S. B., Lappalainen, T., MacArthur, D. G. 2015; 348 (6235): 666-669
  • Genetic conflict reflected in tissue-specific maps of genomic imprinting in human and mouse. Nature genetics Babak, T., Deveale, B., Tsang, E. K., Zhou, Y., Li, X., Smith, K. S., Kukurba, K. R., Zhang, R., Li, J. B., van der Kooy, D., Montgomery, S. B., Fraser, H. B. 2015; 47 (5): 544-549


    Genomic imprinting is an epigenetic process that restricts gene expression to either the maternally or paternally inherited allele. Many theories have been proposed to explain its evolutionary origin, but understanding has been limited by a paucity of data mapping the breadth and dynamics of imprinting within any organism. We generated an atlas of imprinting spanning 33 mouse and 45 human developmental stages and tissues. Nearly all imprinted genes were imprinted in early development and either retained their parent-of-origin expression in adults or lost it completely. Consistent with an evolutionary signature of parental conflict, imprinted genes were enriched for coexpressed pairs of maternally and paternally expressed genes, showed accelerated expression divergence between human and mouse, and were more highly expressed than their non-imprinted orthologs in other species. Our approach demonstrates a general framework for the discovery of imprinting in any species and sheds light on the causes and consequences of genomic imprinting in mammals.

    View details for DOI 10.1038/ng.3274

    View details for PubMedID 25848752

  • Genetic conflict reflected in tissue-specific maps of genomic imprinting in human and mouse. Nature genetics Babak, T., Deveale, B., Tsang, E. K., Zhou, Y., Li, X., Smith, K. S., Kukurba, K. R., Zhang, R., Li, J. B., van der Kooy, D., Montgomery, S. B., Fraser, H. B. 2015; 47 (5): 544-549

    View details for DOI 10.1038/ng.3274

    View details for PubMedID 25848752

  • Tissue-specific effects of genetic and epigenetic variation on gene regulation and splicing. PLoS genetics Gutierrez-Arcelus, M., Ongen, H., Lappalainen, T., Montgomery, S. B., Buil, A., Yurovsky, A., Bryois, J., Padioleau, I., Romano, L., Planchon, A., Falconnet, E., Bielser, D., Gagnebin, M., Giger, T., Borel, C., Letourneau, A., Makrythanasis, P., Guipponi, M., Gehrig, C., Antonarakis, S. E., Dermitzakis, E. T. 2015; 11 (1)


    Understanding how genetic variation affects distinct cellular phenotypes, such as gene expression levels, alternative splicing and DNA methylation levels, is essential for better understanding of complex diseases and traits. Furthermore, how inter-individual variation of DNA methylation is associated to gene expression is just starting to be studied. In this study, we use the GenCord cohort of 204 newborn Europeans' lymphoblastoid cell lines, T-cells and fibroblasts derived from umbilical cords. The samples were previously genotyped for 2.5 million SNPs, mRNA-sequenced, and assayed for methylation levels in 482,421 CpG sites. We observe that methylation sites associated to expression levels are enriched in enhancers, gene bodies and CpG island shores. We show that while the correlation between DNA methylation and gene expression can be positive or negative, it is very consistent across cell-types. However, this epigenetic association to gene expression appears more tissue-specific than the genetic effects on gene expression or DNA methylation (observed in both sharing estimations based on P-values and effect size correlations between cell-types). This predominance of genetic effects can also be reflected by the observation that allele specific expression differences between individuals dominate over tissue-specific effects. Additionally, we discover genetic effects on alternative splicing and interestingly, a large amount of DNA methylation correlating to alternative splicing, both in a tissue-specific manner. The locations of the SNPs and methylation sites involved in these associations highlight the participation of promoter proximal and distant regulatory regions on alternative splicing. Overall, our results provide high-resolution analyses showing how genome sequence variation has a broad effect on cellular phenotypes across cell-types, whereas epigenetic factors provide a secondary layer of variation that is more tissue-specific. Furthermore, the details of how this tissue-specificity may vary across inter-relations of molecular traits, and where these are occurring, can yield further insights into gene regulation and cellular biology as a whole.

    View details for DOI 10.1371/journal.pgen.1004958

    View details for PubMedID 25634236

  • RNA Sequencing and Analysis. Cold Spring Harbor protocols Kukurba, K. R., Montgomery, S. B. 2015; 2015 (11): pdb top084970-?


    RNA sequencing (RNA-Seq) uses the capabilities of high-throughput sequencing methods to provide insight into the transcriptome of a cell. Compared to previous Sanger sequencing- and microarray-based methods, RNA-Seq provides far higher coverage and greater resolution of the dynamic nature of the transcriptome. Beyond quantifying gene expression, the data generated by RNA-Seq facilitate the discovery of novel transcripts, identification of alternatively spliced genes, and detection of allele-specific expression. Recent advances in the RNA-Seq workflow, from sample preparation to library construction to data analysis, have enabled researchers to further elucidate the functional complexity of the transcription. In addition to polyadenylated messenger RNA (mRNA) transcripts, RNA-Seq can be applied to investigate different populations of RNA, including total RNA, pre-mRNA, and noncoding RNA, such as microRNA and long ncRNA. This article provides an introduction to RNA-Seq methods, including applications, experimental design, and technical challenges.

    View details for DOI 10.1101/pdb.top084970

    View details for PubMedID 25870306

  • Type I interferon signaling genes in recurrent major depression: increased expression detected by whole-blood RNA sequencing. Molecular psychiatry Mostafavi, S., Battle, A., Zhu, X., Potash, J. B., Weissman, M. M., Shi, J., Beckman, K., Haudenschild, C., McCormick, C., Mei, R., Gameroff, M. J., Gindes, H., Adams, P., Goes, F. S., Mondimore, F. M., MacKinnon, D. F., Notes, L., Schweizer, B., Furman, D., Montgomery, S. B., Urban, A. E., Koller, D., Levinson, D. F. 2014; 19 (12): 1267-1274


    A study of genome-wide gene expression in major depressive disorder (MDD) was undertaken in a large population-based sample to determine whether altered expression levels of genes and pathways could provide insights into biological mechanisms that are relevant to this disorder. Gene expression studies have the potential to detect changes that may be because of differences in common or rare genomic sequence variation, environmental factors or their interaction. We recruited a European ancestry sample of 463 individuals with recurrent MDD and 459 controls, obtained self-report and semi-structured interview data about psychiatric and medical history and other environmental variables, sequenced RNA from whole blood and genotyped a genome-wide panel of common single-nucleotide polymorphisms. We used analytical methods to identify MDD-related genes and pathways using all of these sources of information. In analyses of association between MDD and expression levels of 13 857 single autosomal genes, accounting for multiple technical, physiological and environmental covariates, a significant excess of low P-values was observed, but there was no significant single-gene association after genome-wide correction. Pathway-based analyses of expression data detected significant association of MDD with increased expression of genes in the interferon α/β signaling pathway. This finding could not be explained by potentially confounding diseases and medications (including antidepressants) or by computationally estimated proportions of white blood cell types. Although cause-effect relationships cannot be determined from these data, the results support the hypothesis that altered immune signaling has a role in the pathogenesis, manifestation, and/or the persistence and progression of MDD.Molecular Psychiatry advance online publication, 3 December 2013; doi:10.1038/mp.2013.161.

    View details for DOI 10.1038/mp.2013.161

    View details for PubMedID 24296977

  • Type I interferon signaling genes in recurrent major depression: increased expression detected by whole-blood RNA sequencing MOLECULAR PSYCHIATRY Mostafavi, S., Battle, A., Zhu, X., Potash, J. B., Weissman, M. M., Shi, J., Beckman, K., Haudenschild, C., McCormick, C., Mei, R., Gameroff, M. J., Gindes, H., Adams, P., Goes, F. S., Mondimore, F. M., MacKinnon, D. F., Notes, L., Schweizer, B., Furman, D., Montgomery, S. B., Urban, A. E., Koller, D., Levinson, D. F. 2014; 19 (12): 1267-1274
  • High-Resolution Transcriptome Analysis with Long-Read RNA Sequencing PLOS ONE Cho, H., Davis, J., Li, X., Smith, K. S., Battle, A., Montgomery, S. B. 2014; 9 (9)
  • Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. American journal of human genetics Li, X., Battle, A., Karczewski, K. J., Zappala, Z., Knowles, D. A., Smith, K. S., Kukurba, K. R., Wu, E., Simon, N., Montgomery, S. B. 2014; 95 (3): 245-256


    Recent and rapid human population growth has led to an excess of rare genetic variants that are expected to contribute to an individual's genetic burden of disease risk. To date, much of the focus has been on rare protein-coding variants, for which potential impact can be estimated from the genetic code, but determining the impact of rare noncoding variants has been more challenging. To improve our understanding of such variants, we combined high-quality genome sequencing and RNA sequencing data from a 17-individual, three-generation family to contrast expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) within this family to eQTLs and sQTLs within a population sample. Using this design, we found that eQTLs and sQTLs with large effects in the family were enriched with rare regulatory and splicing variants (minor allele frequency < 0.01). They were also more likely to influence essential genes and genes involved in complex disease. In addition, we tested the capacity of diverse noncoding annotation to predict the impact of rare noncoding variants. We found that distance to the transcription start site, evolutionary constraint, and epigenetic annotation were considerably more informative for predicting the impact of rare variants than for predicting the impact of common variants. These results highlight that rare noncoding variants are important contributors to individual gene-expression profiles and further demonstrate a significant capability for genomic annotation to predict the impact of rare noncoding variants.

    View details for DOI 10.1016/j.ajhg.2014.08.004

    View details for PubMedID 25192044

  • Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS genetics Martin, A. R., Costa, H. A., Lappalainen, T., Henn, B. M., Kidd, J. M., Yee, M., Grubert, F., Cann, H. M., Snyder, M., Montgomery, S. B., Bustamante, C. D. 2014; 10 (8)


    Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP). The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and regulatory genetics across populations from the broadest points of human migration history yet sampled.

    View details for DOI 10.1371/journal.pgen.1004549

    View details for PubMedID 25121757

  • Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS genetics Martin, A. R., Costa, H. A., Lappalainen, T., Henn, B. M., Kidd, J. M., Yee, M., Grubert, F., Cann, H. M., Snyder, M., Montgomery, S. B., Bustamante, C. D. 2014; 10 (8)

    View details for DOI 10.1371/journal.pgen.1004549

    View details for PubMedID 25121757

  • Cis and trans effects of human genomic variants on gene expression. PLoS genetics Bryois, J., Buil, A., Evans, D. M., Kemp, J. P., Montgomery, S. B., Conrad, D. F., Ho, K. M., Ring, S., Hurles, M., Deloukas, P., Davey Smith, G., Dermitzakis, E. T. 2014; 10 (7)


    Gene expression is a heritable cellular phenotype that defines the function of a cell and can lead to diseases in case of misregulation. In order to detect genetic variations affecting gene expression, we performed association analysis of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) with gene expression measured in 869 lymphoblastoid cell lines of the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort in cis and in trans. We discovered that 3,534 genes (false discovery rate (FDR) = 5%) are affected by an expression quantitative trait locus (eQTL) in cis and 48 genes are affected in trans. We observed that CNVs are more likely to be eQTLs than SNPs. In addition, we found that variants associated to complex traits and diseases are enriched for trans-eQTLs and that trans-eQTLs are enriched for cis-eQTLs. As a variant affecting both a gene in cis and in trans suggests that the cis gene is functionally linked to the trans gene expression, we looked specifically for trans effects of cis-eQTLs. We discovered that 26 cis-eQTLs are associated to 92 genes in trans with the cis-eQTLs of the transcriptions factors BATF3 and HMX2 affecting the most genes. We then explored if the variation of the level of expression of the cis genes were causally affecting the level of expression of the trans genes and discovered several causal relationships between variation in the level of expression of the cis gene and variation of the level of expression of the trans gene. This analysis shows that a large sample size allows the discovery of secondary effects of human variations on gene expression that can be used to construct short directed gene regulatory networks.

    View details for DOI 10.1371/journal.pgen.1004461

    View details for PubMedID 25010687

  • Cis and trans effects of human genomic variants on gene expression. PLoS genetics Bryois, J., Buil, A., Evans, D. M., Kemp, J. P., Montgomery, S. B., Conrad, D. F., Ho, K. M., Ring, S., Hurles, M., Deloukas, P., Davey Smith, G., Dermitzakis, E. T. 2014; 10 (7)

    View details for DOI 10.1371/journal.pgen.1004461

    View details for PubMedID 25010687

  • Determining causality and consequence of expression quantitative trait loci HUMAN GENETICS Battle, A., Montgomery, S. B. 2014; 133 (6): 727-735


    Expression quantitative trait loci (eQTLs) are currently the most abundant and systematically-surveyed class of functional consequence for genetic variation. Recent genetic studies of gene expression have identified thousands of eQTLs in diverse tissue types for the majority of human genes. Application of this large eQTL catalog provides an important resource for understanding the molecular basis of common genetic diseases. However, only now has both the availability of individuals with full genomes and corresponding advances in functional genomics provided the opportunity to dissect eQTLs to identify causal regulatory variants. Resolving the properties of such causal regulatory variants is improving understanding of the molecular mechanisms that influence traits and guiding the development of new genome-scale approaches to variant interpretation. In this review, we provide an overview of current computational and experimental methods for identifying causal regulatory variants and predicting their phenotypic consequences.

    View details for DOI 10.1007/s00439-014-1446-0

    View details for Web of Science ID 000336317000005

    View details for PubMedID 24770875

  • Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues. PLoS genetics Kukurba, K. R., Zhang, R., Li, X., Smith, K. S., Knowles, D. A., How Tan, M., Piskol, R., Lek, M., Snyder, M., MacArthur, D. G., Li, J. B., Montgomery, S. B. 2014; 10 (5)


    Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants.

    View details for DOI 10.1371/journal.pgen.1004304

    View details for PubMedID 24786518

  • Dissecting the causal genetic mechanisms of coronary heart disease. Current atherosclerosis reports Miller, C. L., Assimes, T. L., Montgomery, S. B., Quertermous, T. 2014; 16 (5): 406-?


    Large-scale genome-wide association studies (GWAS) have identified 46 loci that are associated with coronary heart disease (CHD). Additionally, 104 independent candidate variants (false discovery rate of 5 %) have been identified (Schunkert H, Konig IR, Kathiresan S, Reilly MP, Assimes TL, Holm H et al. Nat Genet 43:333-8, 2011; Deloukas P, Kanoni S, Willenborg C, Farrall M, Assimes TL, Thompson JR et al. Nat Genet 45:25-33, 2012; C4D Genetics Consortium. Nat Genet 43:339-44, 2011). The majority of the causal genes in these loci function independently of conventional risk factors. It is postulated that a number of the CHD-associated genes regulate basic processes in the vascular cells involved in atherosclerosis, and that study of the signaling pathways that are modulated in this cell type by causal regulatory variation will provide critical new insights for targeting the initiation and progression of disease. In this review, we will discuss the types of experimental approaches and data that are critical to understanding the molecular processes that underlie the disease risk at 9p21.3, TCF21, SORT1, and other CHD-associated loci.

    View details for DOI 10.1007/s11883-014-0406-4

    View details for PubMedID 24623178

  • SplicePlot: a utility for visualizing splicing quantitative trait loci. Bioinformatics Wu, E., Nance, T., Montgomery, S. B. 2014; 30 (7): 1025-1026


    RNA-Sequencing has provided unprecedented resolution of alternative splicing and splicing-quantitative trait loci (sQTL). However, there are few tools available for visualizing the genotype-dependent effects of splicing at a population level. SplicePlot is a simple command line utility that produces intuitive visualization of sQTLs and their effects. SplicePlot takes mapped RNA-seq reads in BAM format and genotype data in VCF format as input and outputs publication quality sashimi plots, hive plots, and structure plots enabling better investigation and understanding of the role of genetics on alternative splicing and transcript structure.Availability and Implementation: Source code and detailed documentation are available at under Resources and at Github. SplicePlot is implemented in Python and is supported on Linux and Mac OS. A VirtualBox virtual machine running Ubuntu with SplicePlot already installed is also or

    View details for DOI 10.1093/bioinformatics/btt733

    View details for PubMedID 24363378

  • Path-scan: a reporting tool for identifying clinically actionable variants. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Daneshjou, R., Zappala, Z., Kukurba, K., Boyle, S. M., Ormond, K. E., Klein, T. E., Snyder, M., Bustamante, C. D., Altman, R. B., Montgomery, S. B. 2014; 19: 229-240


    The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.

    View details for PubMedID 24297550

  • Transcriptome analysis reveals differential splicing events in IPF lung tissue. PloS one Nance, T., Smith, K. S., Anaya, V., Richardson, R., Ho, L., Pala, M., Mostafavi, S., Battle, A., Feghali-Bostwick, C., Rosen, G., Montgomery, S. B. 2014; 9 (5)


    Idiopathic pulmonary fibrosis (IPF) is a complex disease in which a multitude of proteins and networks are disrupted. Interrogation of the transcriptome through RNA sequencing (RNA-Seq) enables the determination of genes whose differential expression is most significant in IPF, as well as the detection of alternative splicing events which are not easily observed with traditional microarray experiments. We sequenced messenger RNA from 8 IPF lung samples and 7 healthy controls on an Illumina HiSeq 2000, and found evidence for substantial differential gene expression and differential splicing. 873 genes were differentially expressed in IPF (FDR<5%), and 440 unique genes had significant differential splicing events in at least one exonic region (FDR<5%). We used qPCR to validate the differential exon usage in the second and third most significant exonic regions, in the genes COL6A3 (RNA-Seq adjusted pval = 7.18e-10) and POSTN (RNA-Seq adjusted pval = 2.06e-09), which encode the extracellular matrix proteins collagen alpha-3(VI) and periostin. The increased gene-level expression of periostin has been associated with IPF and its clinical progression, but its differential splicing has not been studied in the context of this disease. Our results suggest that alternative splicing of these and other genes may be involved in the pathogenesis of IPF. We have developed an interactive web application which allows users to explore the results of our RNA-Seq experiment, as well as those of two previously published microarray experiments, and we hope that this will serve as a resource for future investigations of gene regulation in IPF.

    View details for DOI 10.1371/journal.pone.0097550

    View details for PubMedID 24805851

  • High-resolution transcriptome analysis with long-read RNA sequencing. PloS one Cho, H., Davis, J., Li, X., Smith, K. S., Battle, A., Montgomery, S. B. 2014; 9 (9)


    RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals.

    View details for DOI 10.1371/journal.pone.0108095

    View details for PubMedID 25251678

  • Transcriptome Analysis Reveals Differential Splicing Events in IPF Lung Tissue. PloS one Nance, T., Smith, K. S., Anaya, V., Richardson, R., Ho, L., Pala, M., Mostafavi, S., Battle, A., Feghali-Bostwick, C., Rosen, G., Montgomery, S. B. 2014; 9 (3)

    View details for DOI 10.1371/journal.pone.0092111

    View details for PubMedID 24647608

  • Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals GENOME RESEARCH Battle, A., Mostafavi, S., Zhu, X., Potash, J. B., Weissman, M. M., McCormick, C., Haudenschild, C. D., Beckman, K. B., Shi, J., Mei, R., Urban, A. E., Montgomery, S. B., Levinson, D. F., Koller, D. 2014; 24 (1): 14-24


    Understanding the consequences of regulatory variation in the human genome remains a major challenge, with important implications for understanding gene regulation and interpreting the many disease-risk variants that fall outside of protein-coding regions. Here, we provide a direct window into the regulatory consequences of genetic variation by sequencing RNA from 922 genotyped individuals. We present a comprehensive description of the distribution of regulatory variation-by the specific expression phenotypes altered, the properties of affected genes, and the genomic characteristics of regulatory variants. We detect variants influencing expression of over ten thousand genes, and through the enhanced resolution offered by RNA-sequencing, for the first time we identify thousands of variants associated with specific phenotypes including splicing and allelic expression. Evaluating the effects of both long-range intra-chromosomal and trans (cross-chromosomal) regulation, we observe modularity in the regulatory network, with three-dimensional chromosomal configuration playing a particular role in regulatory modules within each chromosome. We also observe a significant depletion of regulatory variants affecting central and critical genes, along with a trend of reduced effect sizes as variant frequency increases, providing evidence that purifying selection and buffering have limited the deleterious impact of regulatory variation on the cell. Further, generalizing beyond observed variants, we have analyzed the genomic properties of variants associated with expression and splicing and developed a Bayesian model to predict regulatory consequences of genetic variants, applicable to the interpretation of individual genomes and disease studies. Together, these results represent a critical step toward characterizing the complete landscape of human regulatory variation.

    View details for DOI 10.1101/gr.155192.113

    View details for Web of Science ID 000329163500002

    View details for PubMedID 24092820

  • Quantifying RNA allelic ratios by microfluidic multiplex PCR and sequencing. Nature methods Zhang, R., Li, X., Ramaswami, G., Smith, K. S., Turecki, G., Montgomery, S. B., Li, J. B. 2014; 11 (1): 51-54


    We developed a targeted RNA sequencing method that couples microfluidics-based multiplex PCR and deep sequencing (mmPCR-seq) to uniformly and simultaneously amplify up to 960 loci in 48 samples independently of their gene expression levels and to accurately and cost-effectively measure allelic ratios even for low-quantity or low-quality RNA samples. We applied mmPCR-seq to RNA editing and allele-specific expression studies. mmPCR-seq complements RNA-seq for studying allelic variations in the transcriptome.

    View details for DOI 10.1038/nmeth.2736

    View details for PubMedID 24270603

  • Transcriptome and genome sequencing uncovers functional variation in humans. Nature Lappalainen, T., Sammeth, M., Friedländer, M. R., 't Hoen, P. A., Monlong, J., Rivas, M. A., Gonzàlez-Porta, M., Kurbatova, N., Griebel, T., Ferreira, P. G., Barann, M., Wieland, T., Greger, L., van Iterson, M., Almlöf, J., Ribeca, P., Pulyakhina, I., Esser, D., Giger, T., Tikhonov, A., Sultan, M., Bertier, G., MacArthur, D. G., Lek, M., Lizano, E., Buermans, H. P., Padioleau, I., Schwarzmayr, T., Karlberg, O., Ongen, H., Kilpinen, H., Beltran, S., Gut, M., Kahlem, K., Amstislavskiy, V., Stegle, O., Pirinen, M., Montgomery, S. B., Donnelly, P., McCarthy, M. I., Flicek, P., Strom, T. M., Lehrach, H., Schreiber, S., Sudbrak, R., Carracedo, A., Antonarakis, S. E., Häsler, R., Syvänen, A., van Ommen, G., Brazma, A., Meitinger, T., Rosenstiel, P., Guigó, R., Gut, I. G., Estivill, X., Dermitzakis, E. T. 2013; 501 (7468): 506-511


    Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

    View details for DOI 10.1038/nature12531

    View details for PubMedID 24037378

  • Transcriptome and genome sequencing uncovers functional variation in humans NATURE Lappalainen, T., Sammeth, M., Friedlaender, M. R., 't Hoen, P. A., Monlong, J., Rivas, M. A., Gonzalez-Porta, M., Kurbatova, N., Griebel, T., Ferreira, P. G., Barann, M., Wieland, T., Greger, L., van Iterson, M., Almloef, J., Ribeca, P., Pulyakhina, I., Esser, D., Giger, T., Tikhonov, A., Sultan, M., Bertier, G., MacArthur, D. G., Lek, M., Lizano, E., Buermans, H. P., Padioleau, I., Schwarzmayr, T., Karlberg, O., Ongen, H., Kilpinen, H., Beltran, S., Gut, M., Kahlem, K., Amstislavskiy, V., Stegle, O., Pirinen, M., Montgomery, S. B., Donnelly, P., McCarthy, M. I., Flicek, P., Strom, T. M., Lehrach, H., Schreiber, S., Sudbrak, R., Carracedo, A., Antonarakis, S. E., Haesler, R., Syvaenen, A., van Ommen, G., Brazma, A., Meitinger, T., Rosenstiel, P., Guigo, R., Gut, I. G., Estivill, X., Dermitzakis, E. T. 2013; 501 (7468): 506-511
  • Systematic functional regulatory assessment of disease-associated variants. Proceedings of the National Academy of Sciences of the United States of America Karczewski, K. J., Dudley, J. T., Kukurba, K. R., Chen, R., Butte, A. J., Montgomery, S. B., Snyder, M. 2013; 110 (23): 9607-9612


    Genome-wide association studies have discovered many genetic loci associated with disease traits, but the functional molecular basis of these associations is often unresolved. Genome-wide regulatory and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation and may allow for functional assessment of disease-associated loci. Here, we present a unique approach for systematic integration of genetic disease associations, transcription factor binding among individuals, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In an analysis of genome-wide binding profiles of NFκB, we find that disease-associated SNPs are enriched in NFκB binding regions overall, and specifically for inflammatory-mediated diseases, such as asthma, rheumatoid arthritis, and coronary artery disease. Using genome-wide variation in transcription factor-binding data, we find that NFκB binding is often correlated with disease-associated variants in a genotype-specific and allele-specific manner. Furthermore, we show that this binding variation is often related to expression of nearby genes, which are also found to have altered expression in independent profiling of the variant-associated disease condition. Thus, using this integrative approach, we provide a unique means to assign putative function to many disease-associated SNPs.

    View details for DOI 10.1073/pnas.1219099110

    View details for PubMedID 23690573

  • Desktop transcriptome sequencing from archival tissue to identify clinically relevant translocations. American journal of surgical pathology Sweeney, R. T., Zhang, B., Zhu, S. X., Varma, S., Smith, K. S., Montgomery, S. B., van de Rijn, M., Zehnder, J., West, R. B. 2013; 37 (6): 796-803


    Somatic mutations, often translocations or single nucleotide variations, are pathognomonic for certain types of cancers and are increasingly of clinical importance for diagnosis and prediction of response to therapy. Conventional clinical assays only evaluate 1 mutation at a time, and targeted tests are often constrained to identify only the most common mutations. Genome-wide or transcriptome-wide high-throughput sequencing (HTS) of clinical samples offers an opportunity to evaluate for all clinically significant mutations with a single test. Recently a "desktop version" of HTS has become available, but most of the experience to date is based on data obtained from high-quality DNA from frozen specimens. In this study, we demonstrate, as a proof of principle, that translocations in sarcomas can be diagnosed from formalin-fixed paraffin-embedded (FFPE) tissue with desktop HTS. Using the first generation MiSeq platform, full transcriptome sequencing was performed on FFPE material from archival blocks of 3 synovial sarcomas, 3 myxoid liposarcomas, 2 Ewing sarcomas, and 1 clear cell sarcoma. Mapping the reads to the "sarcomatome" (all known 83 genes involved in translocations and mutations in sarcoma) and using a novel algorithm for ranking fusion candidates, the pathognomonic fusions and the exact breakpoints were identified in all cases of synovial sarcoma, myxoid liposarcoma, and clear cell sarcoma. The Ewing sarcoma fusion gene was detectable in FFPE material only with a sequencing platform that generates greater sequencing depth. The results show that a single transcriptome HTS assay, from FFPE, has the potential to replace conventional molecular diagnostic techniques for the evaluation of clinically relevant mutations in cancer.

    View details for DOI 10.1097/PAS.0b013e31827ad9b2

    View details for PubMedID 23598961

  • The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome research Montgomery, S. B., Goode, D. L., Kvikstad, E., Albers, C. A., Zhang, Z. D., Mu, X. J., Ananda, G., Howie, B., Karczewski, K. J., Smith, K. S., Anaya, V., Richardson, R., Davis, J., MacArthur, D. G., Sidow, A., Duret, L., Gerstein, M., Makova, K. D., Marchini, J., McVean, G., Lunter, G. 2013; 23 (5): 749-761


    Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.

    View details for DOI 10.1101/gr.148718.112

    View details for PubMedID 23478400

  • Examination of the relationship between variation at 17q21 and childhood wheeze phenotypes JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY Granell, R., Henderson, A. J., Timpson, N., St Pourcain, B., Kemp, J. P., Ring, S. M., Ho, K., Montgomery, S. B., Dermitzakis, E. T., Evans, D. M., Sterne, J. A. 2013; 131 (3): 685-694


    Genome-wide association studies have identified associations of genetic variants at 17q21 near ORMDL3 with childhood asthma.We sought to determine whether associations in this region are specific to particular asthma phenotypes and specific to ORMDL3.We examined associations between 244 independent single nucleotide polymorphisms (SNPs) plus 13 previously identified asthma-related SNPs in the region between 34 and 36 Mb on chromosome 17 and early wheezing phenotypes, doctor-diagnosed asthma and atopy at 7½ years, and bronchial hyperresponsiveness and lung function at 8½ years in 7045 children from the Avon Longitudinal Study of Parents and Children birth cohort study. With this, cis expression quantitative trait loci signals for the same SNPs were assessed in 875 samples across genes in the same region.The strongest evidence for phenotypic association was seen for persistent wheezing (rs8076131 near ORMDL3: relative risk ratio [RRR], 1.60 [95% CI, 1.40-1.84], P = 1.4 × 10(-11); rs2305480 near GSDML: RRR, 1.60 [95% CI, 1.39-1.83], P = 1.5 × 10(-11); and rs9303277 near IKZF3: RRR, 1.57 [95% CI, 1.37-1.79], P = 4.4 × 10(-11)). Similar but less precisely estimated effects were seen for intermediate-onset wheeze, but there was little evidence of associations with other wheezing phenotypes. There was some evidence of associations with bronchial hyperresponsiveness. SNPs across the whole region show strong evidence of association with differential levels of expression at GSDML, IKZF3, and MED24, as well as ORMDL3.Associations of SNPs in the 17q21 locus are specific to asthma and specific wheezing phenotypes and are not explained by associations with intermediate phenotypes, such as atopy or lung function.

    View details for DOI 10.1016/j.jaci.2012.09.021

    View details for Web of Science ID 000315587800008

    View details for PubMedID 23154084

  • Integrating GWAS and Expression Data for Functional Characterization of Disease-Associated SNPs: An Application to Follicular Lymphoma AMERICAN JOURNAL OF HUMAN GENETICS Conde, L., Bracci, P. M., Richardson, R., Montgomery, S. B., Skibola, C. F. 2013; 92 (1): 126-130


    Development of post-GWAS (genome-wide association study) methods are greatly needed for characterizing the function of trait-associated SNPs. Strategies integrating various biological data sets with GWAS results will provide insights into the mechanistic role of associated SNPs. Here, we present a method that integrates RNA sequencing (RNA-seq) and allele-specific expression data with GWAS data to further characterize SNPs associated with follicular lymphoma (FL). We investigated the influence on gene expression of three established FL-associated loci-rs10484561, rs2647012, and rs6457327-by measuring their correlation with human-leukocyte-antigen (HLA) expression levels obtained from publicly available RNA-seq expression data sets from lymphoblastoid cell lines. Our results suggest that SNPs linked to the protective variant rs2647012 exert their effect by a cis-regulatory mechanism involving modulation of HLA-DQB1 expression. In contrast, no effect on HLA expression was observed for the colocalized risk variant rs10484561. The application of integrative methods, such as those presented here, to other post-GWAS investigations will help identify causal disease variants and enhance our understanding of biological disease mechanisms.

    View details for DOI 10.1016/j.ajhg.2012.11.009

    View details for Web of Science ID 000313759000013

    View details for PubMedID 23246294

  • Passive and active DNA methylation and the interplay with genetic variation in gene regulation. eLife Gutierrez-Arcelus, M., Lappalainen, T., Montgomery, S. B., Buil, A., Ongen, H., Yurovsky, A., Bryois, J., Giger, T., Romano, L., Planchon, A., Falconnet, E., Bielser, D., Gagnebin, M., Padioleau, I., Borel, C., Letourneau, A., Makrythanasis, P., Guipponi, M., Gehrig, C., Antonarakis, S. E., Dermitzakis, E. T. 2013; 2


    DNA methylation is an essential epigenetic mark whose role in gene regulation and its dependency on genomic sequence and environment are not fully understood. In this study we provide novel insights into the mechanistic relationships between genetic variation, DNA methylation and transcriptome sequencing data in three different cell-types of the GenCord human population cohort. We find that the association between DNA methylation and gene expression variation among individuals are likely due to different mechanisms from those establishing methylation-expression patterns during differentiation. Furthermore, cell-type differential DNA methylation may delineate a platform in which local inter-individual changes may respond to or act in gene regulation. We show that unlike genetic regulatory variation, DNA methylation alone does not significantly drive allele specific expression. Finally, inferred mechanistic relationships using genetic variation as well as correlations with TF abundance reveal both a passive and active role of DNA methylation to regulatory interactions influencing gene expression. DOI:

    View details for DOI 10.7554/eLife.00523

    View details for PubMedID 23755361

  • Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge. PloS one Mostafavi, S., Battle, A., Zhu, X., Urban, A. E., Levinson, D., Montgomery, S. B., Koller, D. 2013; 8 (7)

    View details for DOI 10.1371/journal.pone.0068141

    View details for PubMedID 23874524

  • Performance of genomic medicine. Genome biology 2013; 14 (12): 316


    A report on the Cold Spring Harbor Laboratory meeting on Precision Medicine: Personal Genomes and Pharmacogenomics, held in Cold Spring Harbor, New York, USA, November 13-16, 2013.

    View details for DOI 10.1186/gb4146

    View details for PubMedID 24359965

  • Cancer Transcriptome Sequencing and Analysis Cancer Genomics: From Bench to Personalized Medicine Morin, R. D., Montgomery, S. B. Elsevier. 2013; 1: 31–49
  • Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge. PloS one Mostafavi, S., Battle, A., Zhu, X., Urban, A. E., Levinson, D., Montgomery, S. B., Koller, D. 2013; 8 (7)


    Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcriptome, including novel transcriptional events. However, as with earlier expression assays, analysis of RNA-sequencing data requires carefully accounting for factors that may introduce systematic, confounding variability in the expression measurements, resulting in spurious correlations. Here, we consider the problem of modeling and removing the effects of known and hidden confounding factors from RNA-sequencing data. We describe a unified residual framework that encapsulates existing approaches, and using this framework, present a novel method, HCP (Hidden Covariates with Prior). HCP uses a more informed assumption about the confounding factors, and performs as well or better than existing approaches while having a much lower computational cost. Our experiments demonstrate that accounting for known and hidden factors with appropriate models improves the quality of RNA-sequencing data in two very different tasks: detecting genetic variations that are associated with nearby expression variations (cis-eQTLs), and constructing accurate co-expression networks.

    View details for DOI 10.1371/journal.pone.0068141

    View details for PubMedID 23874524

  • Detection and impact of rare regulatory variants in human disease. Frontiers in genetics Li, X., Montgomery, S. B. 2013; 4: 67-?


    Advances in genome sequencing are providing unprecedented resolution of rare and private variants. However, methods which assess the effect of these variants have relied predominantly on information within coding sequences. Assessing their impact in non-coding sequences remains a significant contemporary challenge. In this review, we highlight the role of regulatory variation as causative agents and modifiers of monogenic disorders. We further discuss how advances in functional genomics are now providing new opportunity to assess the impact of rare non-coding variants and their role in disease.

    View details for DOI 10.3389/fgene.2013.00067

    View details for PubMedID 23755067

  • Sex-biased genetic effects on gene regulation in humans GENOME RESEARCH Dimas, A. S., Nica, A. C., Montgomery, S. B., Stranger, B. E., Raj, T., Buil, A., Giger, T., Lappalainen, T., Gutierrez-Arcelus, M., McCarthy, M. I., Dermitzakis, E. T. 2012; 22 (12): 2368-2375


    Human regulatory variation, reported as expression quantitative trait loci (eQTLs), contributes to differences between populations and tissues. The contribution of eQTLs to differences between sexes, however, has not been investigated to date. Here we explore regulatory variation in females and males and demonstrate that 12%-15% of autosomal eQTLs function in a sex-biased manner. We show that genes possessing sex-biased eQTLs are expressed at similar levels across the sexes and highlight cases of genes controlling sexually dimorphic and shared traits that are under the control of distinct regulatory elements in females and males. This study illustrates that sex provides important context that can modify the effects of functional genetic variants.

    View details for DOI 10.1101/gr.134981.111

    View details for Web of Science ID 000311895500005

    View details for PubMedID 22960374

  • Mapping cis- and trans-regulatory effects across multiple tissues in twins NATURE GENETICS Grundberg, E., Small, K. S., Hedman, A. K., Nica, A. C., Buil, A., Keildson, S., Bell, J. T., Yang, T., Meduri, E., Barrett, A., Nisbett, J., Sekowska, M., Wilk, A., Shin, S., Glass, D., Travers, M., Min, J. L., Ring, S., Ho, K., Thorleifsson, G., Kong, A., Thorsteindottir, U., Ainali, C., Dimas, A. S., Hassanali, N., Ingle, C., Knowles, D., Krestyaninova, M., Lowe, C. E., Di Meglio, P., Montgomery, S. B., Parts, L., Potter, S., Surdulescu, G., Tsaprouni, L., Tsoka, S., Bataille, V., Durbin, R., Nestle, F. O., O'Rahilly, S., Soranzo, N., Lindgren, C. M., Zondervan, K. T., Ahmadi, K. R., Schadt, E. E., Stefansson, K., Smith, G. D., McCarthy, M. I., Deloukas, P., Dermitzakis, E. T., Spector, T. D. 2012; 44 (10): 1084-?


    Sequence-based variation in gene expression is a key driver of disease risk. Common variants regulating expression in cis have been mapped in many expression quantitative trait locus (eQTL) studies, typically in single tissues from unrelated individuals. Here, we present a comprehensive analysis of gene expression across multiple tissues conducted in a large set of mono- and dizygotic twins that allows systematic dissection of genetic (cis and trans) and non-genetic effects on gene expression. Using identity-by-descent estimates, we show that at least 40% of the total heritable cis effect on expression cannot be accounted for by common cis variants, a finding that reveals the contribution of low-frequency and rare regulatory variants with respect to both transcriptional regulation and complex trait susceptibility. We show that a substantial proportion of gene expression heritability is trans to the structural gene, and we identify several replicating trans variants that act predominantly in a tissue-restricted manner and may regulate the transcription of many genes.

    View details for DOI 10.1038/ng.2394

    View details for Web of Science ID 000309550200006

    View details for PubMedID 22941192

  • Genotype-Based Test in Mapping Cis-Regulatory Variants from Allele-Specific Expression Data PLOS ONE Lefebvre, J. F., Vello, E., Ge, B., Montgomery, S. B., Dermitzakis, E. T., Pastinen, T., Labuda, D. 2012; 7 (6)


    Identifying and understanding the impact of gene regulatory variation is of considerable importance in evolutionary and medical genetics; such variants are thought to be responsible for human-specific adaptation and to have an important role in genetic disease. Regulatory variation in cis is readily detected in individuals showing uneven expression of a transcript from its two allelic copies, an observation referred to as allelic imbalance (AI). Identifying individuals exhibiting AI allows mapping of regulatory DNA regions and the potential to identify the underlying causal genetic variant(s). However, existing mapping methods require knowledge of the haplotypes, which make them sensitive to phasing errors. In this study, we introduce a genotype-based mapping test that does not require haplotype-phase inference to locate regulatory regions. The test relies on partitioning genotypes of individuals exhibiting AI and those not expressing AI in a 2×3 contingency table. The performance of this test to detect linkage disequilibrium (LD) between a potential regulatory site and a SNP located in this region was examined by analyzing the simulated and the empirical AI datasets. In simulation experiments, the genotype-based test outperforms the haplotype-based tests with the increasing distance separating the regulatory region from its regulated transcript. The genotype-based test performed equally well with the experimental AI datasets, either from genome-wide cDNA hybridization arrays or from RNA sequencing. By avoiding the need of haplotype inference, the genotype-based test will suit AI analyses in population samples of unknown haplotype structure and will additionally facilitate the identification of cis-regulatory variants that are located far away from the regulated transcript.

    View details for DOI 10.1371/journal.pone.0038667

    View details for Web of Science ID 000305351700058

    View details for PubMedID 22685595

  • Patterns of Cis Regulatory Variation in Diverse Human Populations PLOS GENETICS Stranger, B. E., Montgomery, S. B., Dimas, A. S., Parts, L., Stegle, O., Ingle, C. E., Sekowska, M., Smith, G. D., Evans, D., Gutierrez-Arcelus, M., Price, A., Raj, T., Nisbett, J., Nica, A. C., Beazley, C., Durbin, R., Deloukas, P., Dermitzakis, E. T. 2012; 8 (4): 272-284


    The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.

    View details for DOI 10.1371/journal.pgen.1002639

    View details for Web of Science ID 000303441800020

    View details for PubMedID 22532805

  • A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes SCIENCE MacArthur, D. G., Balasubramanian, S., Frankish, A., Huang, N., Morris, J., Walter, K., Jostins, L., Habegger, L., Pickrell, J. K., Montgomery, S. B., Albers, C. A., Zhang, Z. D., Conrad, D. F., Lunter, G., Zheng, H., Ayub, Q., DePristo, M. A., Banks, E., Hu, M., Handsaker, R. E., Rosenfeld, J. A., Fromer, M., Jin, M., Mu, X. J., Khurana, E., Ye, K., Kay, M., Saunders, G. I., Suner, M., Hunt, T., Barnes, I. H., Amid, C., Carvalho-Silva, D. R., Bignell, A. H., Snow, C., Yngvadottir, B., Bumpstead, S., Cooper, D. N., Xue, Y., Romero, I. G., Wang, J., Li, Y., Gibbs, R. A., McCarroll, S. A., Dermitzakis, E. T., Pritchard, J. K., Barrett, J. C., Harrow, J., Hurles, M. E., Gerstein, M. B., Tyler-Smith, C. 2012; 335 (6070): 823-828


    Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.

    View details for DOI 10.1126/science.1215040

    View details for Web of Science ID 000300356400036

    View details for PubMedID 22344438

  • Meta-analysis of genome-wide association studies identifies three new risk loci for atopic dermatitis NATURE GENETICS Paternoster, L., Standl, M., Chen, C., Ramasamy, A., Bonnelykke, K., Duijts, L., Ferreira, M. A., Alves, A. C., Thyssen, J. P., Albrecht, E., Baurecht, H., Feenstra, B., Sleiman, P. M., Hysi, P., Warrington, N. M., Curjuric, I., Myhre, R., Curtin, J. A., Groen-Blokhuis, M. M., Kerkhof, M., Saaf, A., Franke, A., Ellinghaus, D., Foelster-Holst, R., Dermitzakis, E., Montgomery, S. B., Prokisch, H., Heim, K., Hartikainen, A., Pouta, A., Pekkanen, J., Blakemore, A. I., Buxton, J. L., Kaakinen, M., Duffy, D. L., Madden, P. A., Heath, A. C., Montgomery, G. W., Thompson, P. J., Matheson, M. C., Le Souef, P., St Pourcain, B., Smith, G. D., Henderson, J., Kemp, J. P., Timpson, N. J., Deloukas, P., Ring, S. M., Wichmann, H., Mueller-Nurasyid, M., Novak, N., Klopp, N., Rodriguez, E., McArdle, W., Linneberg, A., Menne, T., Nohr, E. A., Hofman, A., Uitterlinden, A. G., van Duijin, C. M., Rivadeneira, F., de Jongste, J. C., van der Valk, R. J., Wjst, M., Jogi, R., Geller, F., Boyd, H. A., Murray, J. C., Kim, C., Mentch, F., March, M., Mangino, M., Spector, T. D., Bataille, V., Pennell, C. E., Holt, P. G., Sly, P., Tiesler, C. M., Thiering, E., Illig, T., Imboden, M., Nystad, W., Simpson, A., Hottenga, J., Postma, D., Koppelman, G. H., Smit, H. A., Soderhall, C., Chawes, B., Kreiner-Moller, E., Bisgaard, H., Melen, E., Boomsma, D. I., Custovic, A., Jacobsson, B., Probst-Hensch, N. M., Palmer, L. J., Glass, D., Hakonarson, H., Melbye, M., Jarvis, D. L., Jaddoe, V. W., Gieger, C., Strachan, D. P., Martin, N. G., Jarvelin, M., Heinrich, J., Evans, D. M., Weidinger, S. 2012; 44 (2): 187-192


    Atopic dermatitis (AD) is a commonly occurring chronic skin disease with high heritability. Apart from filaggrin (FLG), the genes influencing atopic dermatitis are largely unknown. We conducted a genome-wide association meta-analysis of 5,606 affected individuals and 20,565 controls from 16 population-based cohorts and then examined the ten most strongly associated new susceptibility loci in an additional 5,419 affected individuals and 19,833 controls from 14 studies. Three SNPs reached genome-wide significance in the discovery and replication cohorts combined, including rs479844 upstream of OVOL1 (odds ratio (OR) = 0.88, P = 1.1 × 10(-13)) and rs2164983 near ACTL9 (OR = 1.16, P = 7.1 × 10(-9)), both of which are near genes that have been implicated in epidermal proliferation and differentiation, as well as rs2897442 in KIF3A within the cytokine cluster at 5q31.1 (OR = 1.11, P = 3.8 × 10(-8)). We also replicated association with the FLG locus and with two recently identified association signals at 11q13.5 (rs7927894; P = 0.008) and 20q13.33 (rs6010620; P = 0.002). Our results underline the importance of both epidermal barrier function and immune dysregulation in atopic dermatitis pathogenesis.

    View details for DOI 10.1038/ng.1017

    View details for Web of Science ID 000299664400018

    View details for PubMedID 22197932

  • DNA methylation profiles of human active and inactive X chromosomes GENOME RESEARCH Sharp, A. J., Stathaki, E., Migliavacca, E., Brahmachary, M., Montgomery, S. B., Dupre, Y., Antonarakis, S. E. 2011; 21 (10): 1592-1600


    X-chromosome inactivation (XCI) is a dosage compensation mechanism that silences the majority of genes on one X chromosome in each female cell. To characterize epigenetic changes that accompany this process, we measured DNA methylation levels in 45,X patients carrying a single active X chromosome (X(a)), and in normal females, who carry one X(a) and one inactive X (X(i)). Methylated DNA was immunoprecipitated and hybridized to high-density oligonucleotide arrays covering the X chromosome, generating epigenetic profiles of active and inactive X chromosomes. We observed that XCI is accompanied by changes in DNA methylation specifically at CpG islands (CGIs). While the majority of CGIs show increased methylation levels on the X(i), XCI actually results in significant reductions in methylation at 7% of CGIs. Both intra- and inter-genic CGIs undergo epigenetic modification, with the biggest increase in methylation occurring at the promoters of genes silenced by XCI. In contrast, genes escaping XCI generally have low levels of promoter methylation, while genes that show inter-individual variation in silencing show intermediate increases in methylation. Thus, promoter methylation and susceptibility to XCI are correlated. We also observed a global correlation between CGI methylation and the evolutionary age of X-chromosome strata, and that genes escaping XCI show increased methylation within gene bodies. We used our epigenetic map to predict 26 novel genes escaping XCI, and searched for parent-of-origin-specific methylation differences, but found no evidence to support imprinting on the human X chromosome. Our study provides a detailed analysis of the epigenetic profile of active and inactive X chromosomes.

    View details for DOI 10.1101/gr.112680.110

    View details for Web of Science ID 000295407800004

    View details for PubMedID 21862626

  • Epistatic Selection between Coding and Regulatory Variation in Human Evolution and Disease AMERICAN JOURNAL OF HUMAN GENETICS Lappalainen, T., Montgomery, S. B., Nica, A. C., Dermitzakis, E. T. 2011; 89 (3): 459-463


    Interaction (nonadditive effects) between genetic variants has been highlighted as an important mechanism underlying phenotypic variation, but the discovery of genetic interactions in humans has proved difficult. In this study, we show that the spectrum of variation in the human genome has been shaped by modifier effects of cis-regulatory variation on the functional impact of putatively deleterious protein-coding variants. We analyzed 1000 Genomes population-scale resequencing data from Europe (CEU [Utah residents with Northern and Western European ancestry from the CEPH collection]) and Africa (YRI [Yoruba in Ibadan, Nigeria]) together with gene expression data from arrays and RNA sequencing for the same samples. We observed an underrepresentation of derived putatively functional coding variation on the more highly expressed regulatory haplotype, which suggests stronger purifying selection against deleterious coding variants that have increased penetrance because of their regulatory background. Furthermore, the frequency spectrum and impact size distribution of common regulatory polymorphisms (eQTLs) appear to be shaped in order to minimize the selective disadvantage of having deleterious coding mutations on the more highly expressed haplotype. Interestingly, eQTLs explaining common disease GWAS signals showed an enrichment of putative epistatic effects, suggesting that some disease associations might arise from interactions increasing the penetrance of rare coding variants. In conclusion, our results indicate that regulatory and coding variants often modify the functional impact of each other. This specific type of genetic interaction is detectable from sequencing data in a genome-wide manner, and characterizing these joint effects might help us understand functional mechanisms behind genetic associations to human phenotypes-including both Mendelian and common disease.

    View details for DOI 10.1016/j.ajhg.2011.08.004

    View details for Web of Science ID 000294939800012

    View details for PubMedID 21907014

  • Rare and Common Regulatory Variation in Population-Scale Sequenced Human Genomes PLOS GENETICS Montgomery, S. B., Lappalainen, T., Gutierrez-Arcelus, M., Dermitzakis, E. T. 2011; 7 (7)


    Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.

    View details for DOI 10.1371/journal.pgen.1002144

    View details for Web of Science ID 000293338600007

    View details for PubMedID 21811411

  • Genome-wide association study identifies a common variant associated with risk of endometrial cancer NATURE GENETICS Spurdle, A. B., Thompson, D. J., Ahmed, S., Ferguson, K., Healey, C. S., O'Mara, T., Walker, L. C., Montgomery, S. B., Dermitzakis, E. T., Fahey, P., Montgomery, G. W., Webb, P. M., Fasching, P. A., Beckmann, M. W., Ekici, A. B., Hein, A., Lambrechts, D., Coenegrachts, L., Vergote, I., Amant, F., Salvesen, H. B., Trovik, J., Njolstad, T. S., Helland, H., Scott, R. J., Ashton, K., Proietto, T., Otton, G., Tomlinson, I., Gorman, M., Howarth, K., Hodgson, S., Garcia-Closas, M., Wentzensen, N., Yang, H., Chanock, S., Hall, P., Czene, K., Liu, J., Li, J., Shu, X., Zheng, W., Long, J., Xiang, Y., Shah, M., Morrison, J., Michailidou, K., Pharoah, P. D., Dunning, A. M., Easton, D. F. 2011; 43 (5): 451-?


    Endometrial cancer is the most common malignancy of the female genital tract in developed countries. To identify genetic variants associated with endometrial cancer risk, we performed a genome-wide association study involving 1,265 individuals with endometrial cancer (cases) from Australia and the UK and 5,190 controls from the Wellcome Trust Case Control Consortium. We compared genotype frequencies in cases and controls for 519,655 SNPs. Forty seven SNPs that showed evidence of association with endometrial cancer in stage 1 were genotyped in 3,957 additional cases and 6,886 controls. We identified an endometrial cancer susceptibility locus close to HNF1B at 17q12 (rs4430796, P = 7.1 × 10(-10)) that is also associated with risk of prostate cancer and is inversely associated with risk of type 2 diabetes.

    View details for DOI 10.1038/ng.812

    View details for Web of Science ID 000289972600015

    View details for PubMedID 21499250

  • From expression QTLs to personalized transcriptomics NATURE REVIEWS GENETICS Montgomery, S. B., Dermitzakis, E. T. 2011; 12 (4): 277-282


    Approaches that combine expression quantitative trait loci (eQTLs) and genome-wide association (GWA) studies are offering new functional information about the aetiology of complex human traits and diseases. Improved study designs--which take into account technological advances in resolving the transcriptome, cell history and state, population of origin and diverse endophenotypes--are providing insights into the architecture of disease and the landscape of gene regulation in humans. Furthermore, these advances are helping to establish links between cellular effects and organismal traits.

    View details for DOI 10.1038/nrg2969

    View details for Web of Science ID 000288531700011

    View details for PubMedID 21386863

  • The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study PLOS GENETICS Nica, A. C., Parts, L., Glass, D., Nisbet, J., Barrett, A., Sekowska, M., Travers, M., Potter, S., Grundberg, E., Small, K., Hedman, A. K., Bataille, V., Bell, J. T., Surdulescu, G., Dimas, A. S., Ingle, C., Nestle, F. O., Di Meglio, P., Min, J. L., Wilk, A., Hammond, C. J., Hassanali, N., Yang, T., Montgomery, S. B., O'Rahilly, S., Lindgren, C. M., Zondervan, K. T., Soranzo, N., Barroso, I., Durbin, R., Ahmadi, K., Deloukas, P., McCarthy, M. I., Dermitzakis, E. T., Spector, T. D. 2011; 7 (2)


    While there have been studies exploring regulatory variation in one or more tissues, the complexity of tissue-specificity in multiple primary tissues is not yet well understood. We explore in depth the role of cis-regulatory variation in three human tissues: lymphoblastoid cell lines (LCL), skin, and fat. The samples (156 LCL, 160 skin, 166 fat) were derived simultaneously from a subset of well-phenotyped healthy female twins of the MuTHER resource. We discover an abundance of cis-eQTLs in each tissue similar to previous estimates (858 or 4.7% of genes). In addition, we apply factor analysis (FA) to remove effects of latent variables, thus more than doubling the number of our discoveries (1,822 eQTL genes). The unique study design (Matched Co-Twin Analysis--MCTA) permits immediate replication of eQTLs using co-twins (93%-98%) and validation of the considerable gain in eQTL discovery after FA correction. We highlight the challenges of comparing eQTLs between tissues. After verifying previous significance threshold-based estimates of tissue-specificity, we show their limitations given their dependency on statistical power. We propose that continuous estimates of the proportion of tissue-shared signals and direct comparison of the magnitude of effect on the fold change in expression are essential properties that jointly provide a biologically realistic view of tissue-specificity. Under this framework we demonstrate that 30% of eQTLs are shared among the three tissues studied, while another 29% appear exclusively tissue-specific. However, even among the shared eQTLs, a substantial proportion (10%-20%) have significant differences in the magnitude of fold change between genotypic classes across tissues. Our results underline the need to account for the complexity of eQTL tissue-specificity in an effort to assess consequences of such variants for complex traits.

    View details for DOI 10.1371/journal.pgen.1002003

    View details for Web of Science ID 000287697300035

    View details for PubMedID 21304890

  • Identification of cis- and trans- regulatory variation modulating microRNA expression levels in human fibroblasts GENOME RESEARCH Borel, C., Deutsch, S., Letourneau, A., Migliavacca, E., Montgomery, S. B., Dimas, A. S., Vejnar, C. E., Attar, H., Gagnebin, M., Gehrig, C., Falconnet, E., Dupre, Y., Dermitzakis, E. T., Antonarakis, S. E. 2011; 21 (1): 68-73


    MicroRNAs (miRNAs) are regulatory noncoding RNAs that affect the production of a significant fraction of human mRNAs via post-transcriptional regulation. Interindividual variation of the miRNA expression levels is likely to influence the expression of miRNA target genes and may therefore contribute to phenotypic differences in humans, including susceptibility to common disorders. The extent to which miRNA levels are genetically controlled is largely unknown. In this report, we assayed the expression levels of miRNAs in primary fibroblasts from 180 European newborns of the GenCord project and performed association analysis to identify eQTLs (expression quantitative traits loci). We detected robust expression for 121 miRNAs out of 365 interrogated. We have identified significant cis- (10%) and trans- (11%) eQTLs. Furthermore, we detected one genomic locus (rs1522653) that influences the expression levels of five miRNAs, thus unraveling a novel mechanism for coregulation of miRNA expression.

    View details for DOI 10.1101/gr.109371.110

    View details for Web of Science ID 000285868300007

    View details for PubMedID 21147911

  • A map of human genome variation from population-scale sequencing NATURE Altshuler, D., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., Collins, F. S., De La Vega, F. M., Donnelly, P., Egholm, M., Flicek, P., Gabriel, S. B., Gibbs, R. A., Knoppers, B. M., Lander, E. S., Lehrach, H., Mardis, E. R., McVean, G. A., Nickerson, D., Peltonen, L., Schafer, A. J., Sherry, S. T., Wang, J., Wilson, R. K., Gibbs, R. A., Deiros, D., Metzker, M., Muzny, D., Reid, J., Wheeler, D., Wang, J., Li, J., Jian, M., Li, G., Li, R., Liang, H., Tian, G., Wang, B., Wang, J., Wang, W., Yang, H., Zhang, X., Zheng, H., Lander, E. S., Altshuler, D. L., Ambrogio, L., Bloom, T., Cibulskis, K., Fennell, T. J., Gabriel, S. B., Jaffe, D. B., Shefler, E., Sougnez, C. L., Bentley, D. R., Gormley, N., Humphray, S., Kingsbury, Z., Koko-Gonzales, P., Stone, J., McKernan, K. J., Costa, G. L., Ichikawa, J. K., Lee, C. C., Sudbrak, R., Lehrach, H., Borodina, T. A., Dahl, A., Davydov, A. N., Marquardt, P., Mertes, F., Nietfeld, W., Rosenstiel, P., Schreiber, S., Soldatov, A. V., Timmermann, B., Tolzmann, M., Egholm, M., Affourtit, J., Ashworth, D., Attiya, S., Bachorski, M., Buglione, E., Burke, A., Caprio, A., Celone, C., Clark, S., Conners, D., Desany, B., Gu, L., Guccione, L., Kao, K., Kebbel, A., Knowlton, J., Labrecque, M., McDade, L., Mealmaker, C., Minderman, M., Nawrocki, A., Niazi, F., Pareja, K., Ramenani, R., Riches, D., Song, W., Turcotte, C., Wang, S., Mardis, E. R., Dooling, D., Fulton, L., Fulton, R., Weinstock, G., Durbin, R. M., Burton, J., Carter, D. M., Churcher, C., Coffey, A., Cox, A., Palotie, A., Quail, M., Skelly, T., Stalker, J., Swerdlow, H. P., Turner, D., De Witte, A., Giles, S., Gibbs, R. A., Wheeler, D., Bainbridge, M., Challis, D., Sabo, A., Yu, F., Yu, J., Wang, J., Fang, X., Guo, X., Li, R., Li, Y., Luo, R., Tai, S., Wu, H., Zheng, H., Zheng, X., Zhou, Y., Yang, H., Marth, G. T., Garrison, E. P., Huang, W., Indap, A., Kural, D., Lee, W., Leong, W. F., Huang, W., Indap, A., Kural, D., Lee, W., Leong, W. F., Quinlan, A. R., Stewart, C., Stromberg, M. P., Ward, A. N., Wu, J., Lee, C., Mills, R. E., Shi, X., Daly, M. J., DePristo, M. A., Altshuler, D. L., Ball, A. D., Banks, E., Bloom, T., Browning, B. L., Cibulskis, K., Fennell, T. J., Garimella, K. V., Grossman, S. R., Handsaker, R. E., Hanna, M., Hartl, C., Jaffe, D. B., Kernytsky, A. M., Korn, J. M., Li, H., Maguire, J. R., McCarroll, S. A., McKenna, A., Nemesh, J. C., Philippakis, A. A., Poplin, R. E., Price, A., Rivas, M. A., Sabeti, P. C., Schaffner, S. F., Shefler, E., Shlyakhter, I. A., Cooper, D. N., Ball, E. V., Mort, M., Phillips, A. D., Stenson, P. D., Sebat, J., Makarov, V., Ye, K., Yoon, S. C., Bustamante, C. D., Clark, A. G., Boyko, A., Degenhardt, J., Gravel, S., Gutenkunst, R. N., Kaganovich, M., Keinan, A., Lacroute, P., Ma, X., Reynolds, A., Clarke, L., Flicek, P., Cunningham, F., Herrero, J., Keenen, S., Kulesha, E., Leinonen, R., McLaren, W., Radhakrishnan, R., Smith, R. E., Zalunin, V., Zheng-Bradley, X., Korbel, J. O., Stuetz, A. M., Humphray, S., Bauer, M., Cheetham, R. K., Cox, T., Eberle, M., James, T., Kahn, S., Murray, L., Ye, K., De La Vega, F. M., Fu, Y., Hyland, F. C., Manning, J. M., McLaughlin, S. F., Peckham, H. E., Sakarya, O., Sun, Y. A., Tsung, E. F., Batzer, M. A., Konkel, M. K., Walker, J. A., Sudbrak, R., Albrecht, M. W., Amstislavskiy, V. S., Herwig, R., Parkhomchuk, D. V., Sherry, S. T., Agarwala, R., Khouri, H., Morgulis, A. O., Paschall, J. E., Phan, L. D., Rotmistrovsky, K. E., Sanders, R. D., Shumway, M. F., Xiao, C., McVean, G. A., Auton, A., Iqbal, Z., Lunter, G., Marchini, J. L., Moutsianas, L., Myers, S., Tumian, A., Desany, B., Knight, J., Winer, R., Craig, D. W., Beckstrom-Sternberg, S. M., Christoforides, A., Kurdoglu, A. A., Pearson, J., Sinari, S. A., Tembe, W. D., Haussler, D., Hinrichs, A. S., Katzman, S. J., Kern, A., Kuhn, R. M., Przeworski, M., Hernandez, R. D., Howie, B., Kelley, J. L., Melton, S. C., Abecasis, G. R., Li, Y., Anderson, P., Blackwell, T., Chen, W., Cookson, W. O., Ding, J., Kang, H. M., Lathrop, M., Liang, L., Moffatt, M. F., Scheet, P., Sidore, C., Snyder, M., Zhan, X., Zoellner, S., Awadalla, P., Casals, F., Idaghdour, Y., Keebler, J., Stone, E. A., Zilversmit, M., Jorde, L., Xing, J., Eichler, E. E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J. M., Sahinalp, S. C., Sudmant, P. H., Mardis, E. R., Chen, K., Chinwalla, A., Ding, L., Koboldt, D. C., McLellan, M. D., Dooling, D., Weinstock, G., Wallis, J. W., Wendl, M. C., Zhang, Q., Durbin, R. M., Albers, C. A., Ayub, Q., Balasubramaniam, S., Barrett, J. C., Carter, D. M., Chen, Y., Conrad, D. F., Danecek, P., Dermitzakis, E. T., Hu, M., Huang, N., Hurles, M. E., Jin, H., Jostins, L., Keane, T. M., Keane, T. M., Le, S. Q., Lindsay, S., Long, Q., MacArthur, D. G., Montgomery, S. B., Parts, L., Stalker, J., Tyler-Smith, C., Walter, K., Zhang, Y., Gerstein, M. B., Snyder, M., Abyzov, A., Abyzov, A., Balasubramanian, S., Bjornson, R., Du, J., Grubert, F., Habegger, L., Haraksingh, R., Jee, J., Khurana, E., Lam, H. Y., Leng, J., Mu, X. J., Urban, A. E., Zhang, Z., Li, Y., Luo, R., Marth, G. T., Garrison, E. P., Kural, D., Quinlan, A. R., Stewart, C., Stromberg, M. P., Ward, A. N., Wu, J., Lee, C., Mills, R. E., Shi, X., McCarroll, S. A., Banks, E., DePristo, M. A., Handsaker, R. E., Hartl, C., Korn, J. M., Li, H., Nemesh, J. C., Sebat, J., Makarov, V., Ye, K., Yoon, S. C., Degenhardt, J., Kaganovich, M., Clarke, L., Smith, R. E., Zheng-Bradley, X., Korbel, J. O., Humphray, S., Cheetham, R. K., Eberle, M., Kahn, S., Murray, L., Ye, K., De La Vega, F. M., Fu, Y., Peckham, H. E., Sun, Y. A., Batzer, M. A., Konkel, M. K., Xiao, C., Iqbal, Z., Desany, B., Blackwell, T., Snyder, M., Xing, J., Eichler, E. E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J. M., Chen, K., Chinwalla, A., Ding, L., McLellan, M. D., Wallis, J. W., Hurles, M. E., Conrad, D. F., Walter, K., Zhang, Y., Gerstein, M. B., Snyder, M., Abyzov, A., Du, J., Grubert, F., Haraksingh, R., Jee, J., Khurana, E., Lam, H. Y., Leng, J., Mu, X. J., Urban, A. E., Zhang, Z., Gibbs, R. A., Bainbridge, M., Challis, D., Coafra, C., Dinh, H., Kovar, C., Lee, S., Muzny, D., Nazareth, L., Reid, J., Sabo, A., Yu, F., Yu, J., Marth, G. T., Garrison, E. P., Indap, A., Leong, W. F., Quinlan, A. R., Stewart, C., Ward, A. N., Wu, J., Cibulskis, K., Fennell, T. J., Gabriel, S. B., Garimella, K. V., Hartl, C., Shefler, E., Sougnez, C. L., Wilkinson, J., Clark, A. G., Gravel, S., Grubert, F., Clarke, L., Flicek, P., Smith, R. E., Zheng-Bradley, X., Sherry, S. T., Khouri, H. M., Paschall, J. E., Shumway, M. F., Xiao, C., McVean, G. A., Katzman, S. J., Abecasis, G. R., Blackwell, T., Mardis, E. R., Dooling, D., Fulton, L., Fulton, R., Koboldt, D. C., Durbin, R. M., Balasubramaniam, S., Coffey, A., Keane, T. M., MacArthur, D. G., Palotie, A., Scott, C., Stalker, J., Tyler-Smith, C., Gerstein, M. B., Balasubramanian, S., Chakravarti, A., Knoppers, B. M., Peltonen, L., Abecasis, G. R., Bustamante, C. D., Gharani, N., Gibbs, R. A., Jorde, L., Kaye, J. S., Kent, A., Li, T., McGuire, A. L., McVean, G. A., Ossorio, P. N., Rotimi, C. N., Su, Y., Toji, L. H., Tyler-Smith, C., Brooks, L. D., Felsenfeld, A. L., McEwen, J. E., Abdallah, A., Juenger, C. R., Clemm, N. C., Collins, F. S., Duncanson, A., Green, E. D., Guyer, M. S., Peterson, J. L., Schafer, A. J., Abecasis, G. R., Altshuler, D. L., Auton, A., Brooks, L. D., Durbin, R. M., Gibbs, R. A., Hurles, M. E., McVean, G. A. 2010; 467 (7319): 1061-1073


    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

    View details for DOI 10.1038/nature09534

    View details for Web of Science ID 000283548600039

  • Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies BIOINFORMATICS Yang, T., Beazley, C., Montgomery, S. B., Dimas, A. S., Gutierrez-Arcelus, M., Stranger, B. E., Deloukas, P., Dermitzakis, E. T. 2010; 26 (19): 2474-2476


    Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols.

    View details for DOI 10.1093/bioinformatics/btq452

    View details for Web of Science ID 000282170000023

    View details for PubMedID 20702402

  • Integrating common and rare genetic variation in diverse human populations NATURE Altshuler, D. M., Gibbs, R. A., Peltonen, L., Dermitzakis, E., Schaffner, S. F., Yu, F., Bonnen, P. E., de Bakker, P. I., Deloukas, P., Gabriel, S. B., Gwilliam, R., Hunt, S., Inouye, M., Jia, X., Palotie, A., Parkin, M., Whittaker, P., Chang, K., Hawes, A., Lewis, L. R., Ren, Y., Wheeler, D., Muzny, D. M., Barnes, C., Darvishi, K., Hurles, M., Korn, J. M., Kristiansson, K., Lee, C., McCarroll, S. A., Nemesh, J., Keinan, A., Montgomery, S. B., Pollack, S., Price, A. L., Soranzo, N., Gonzaga-Jauregui, C., Anttila, V., Brodeur, W., Daly, M. J., Leslie, S., McVean, G., Moutsianas, L., Nguyen, H., Zhang, Q., Ghori, M. J., McGinnis, R., McLaren, W., Takeuchi, F., Grossman, S. R., Shlyakhter, I., Hostetter, E. B., Sabeti, P. C., Adebamowo, C. A., Foster, M. W., Gordon, D. R., Licinio, J., Manca, M. C., Marshall, P. A., Matsuda, I., Ngare, D., Wang, V. O., Reddy, D., Rotimi, C. N., Royal, C. D., Sharp, R. R., Zeng, C., Brooks, L. D., McEwen, J. E. 2010; 467 (7311): 52-58


    Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of

    View details for DOI 10.1038/nature09298

    View details for Web of Science ID 000281461200033

    View details for PubMedID 20811451

  • Transcriptome genetics using second generation sequencing in a Caucasian population NATURE Montgomery, S. B., Sammeth, M., Gutierrez-Arcelus, M., Lach, R. P., Ingle, C., Nisbett, J., Guigo, R., Dermitzakis, E. T. 2010; 464 (7289): 773-U151


    Gene expression is an important phenotype that informs about genetic and environmental effects on cellular state. Many studies have previously identified genetic variants for gene expression phenotypes using custom and commercially available microarrays. Second generation sequencing technologies are now providing unprecedented access to the fine structure of the transcriptome. We have sequenced the mRNA fraction of the transcriptome in 60 extended HapMap individuals of European descent and have combined these data with genetic variants from the HapMap3 project. We have quantified exon abundance based on read depth and have also developed methods to quantify whole transcript abundance. We have found that approximately 10 million reads of sequencing can provide access to the same dynamic range as arrays with better quantification of alternative and highly abundant transcripts. Correlation with SNPs (small nucleotide polymorphisms) leads to a larger discovery of eQTLs (expression quantitative trait loci) than with arrays. We also detect a substantial number of variants that influence the structure of mature transcripts indicating variants responsible for alternative splicing. Finally, measures of allele-specific expression allowed the identification of rare eQTLs and allelic differences in transcript structure. This analysis shows that high throughput sequencing technologies reveal new properties of genetic effects on the transcriptome and allow the exploration of genetic effects in cellular processes.

    View details for DOI 10.1038/nature08903

    View details for Web of Science ID 000276205000048

    View details for PubMedID 20220756

  • Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations PLOS GENETICS Nica, A. C., Montgomery, S. B., Dimas, A. S., Stranger, B. E., Beazley, C., Barroso, I., Dermitzakis, E. T. 2010; 6 (4)


    The recent success of genome-wide association studies (GWAS) is now followed by the challenge to determine how the reported susceptibility variants mediate complex traits and diseases. Expression quantitative trait loci (eQTLs) have been implicated in disease associations through overlaps between eQTLs and GWAS signals. However, the abundance of eQTLs and the strong correlation structure (LD) in the genome make it likely that some of these overlaps are coincidental and not driven by the same functional variants. In the present study, we propose an empirical methodology, which we call Regulatory Trait Concordance (RTC) that accounts for local LD structure and integrates eQTLs and GWAS results in order to reveal the subset of association signals that are due to cis eQTLs. We simulate genomic regions of various LD patterns with both a single or two causal variants and show that our score outperforms SNP correlation metrics, be they statistical (r(2)) or historical (D'). Following the observation of a significant abundance of regulatory signals among currently published GWAS loci, we apply our method with the goal to prioritize relevant genes for each of the respective complex traits. We detect several potential disease-causing regulatory effects, with a strong enrichment for immunity-related conditions, consistent with the nature of the cell line tested (LCLs). Furthermore, we present an extension of the method in trans, where interrogating the whole genome for downstream effects of the disease variant can be informative regarding its unknown primary biological effect. We conclude that integrating cellular phenotype associations with organismal complex traits will facilitate the biological interpretation of the genetic effects on these traits.

    View details for DOI 10.1371/journal.pgen.1000895

    View details for Web of Science ID 000277354200012

    View details for PubMedID 20369022

  • Out of the sequencer and into the wiki as we face new challenges in genome informatics. Genome biology Ning, Z., Montgomery, S. B. 2010; 11 (10): 308-?


    A report on the joint Cold Spring Harbor Laboratory/Wellcome Trust Conference 'Genome Informatics', 15-19 September 2010, Hinxton, Cambridge, UK.

    View details for DOI 10.1186/gb-2010-11-10-308

    View details for PubMedID 21067526

  • Annotating the regulatory genome. Methods in molecular biology (Clifton, N.J.) Montgomery, S. B., Kasaian, K., Jones, S. J., Griffith, O. L. 2010; 674: 313-349


    Determining the timing and molecular repertoire responsible for gene expression is fundamental to understanding a gene's function. Heritable differences in this character are increasingly regarded as explanatory for complex and common traits. For many known trait-predisposing genes, studies have sought to elucidate the associated logic behind gene regulation. However, there exist many challenges in deciphering these mechanisms. Among them, it is recognized that we have limited understanding of regulatory complexity, the current models of gene regulation have low specificity and any gene's regulatory logic is dependent on biological context. Addressing these limitations and defining the regulatory genome is an ongoing challenge for molecular biology. We discuss current efforts to define and annotate the regulatory genome by focusing on curation and text-mining activities. We further highlight the type of information and curation process for describing regulatory elements within the ORegAnno database ( ) and how the general standards for such information are changing.

    View details for DOI 10.1007/978-1-60761-854-6_20

    View details for PubMedID 20827601

  • The resolution of the genetics of gene expression HUMAN MOLECULAR GENETICS Montgomery, S. B., Dermitzakis, E. T. 2009; 18: R211-R215


    Understanding the influence of genetics on the molecular mechanisms underpinning human phenotypic diversity is fundamental to being able to predict health outcomes and treat disease. To interrogate the role of genetics on cellular state and function, gene expression has been extensively used. Past and present studies have highlighted important patterns of heritability, population differentiation and tissue-specificity in gene expression. Current and future studies are taking advantage of systems biology-based approaches and advances in sequencing technology: new methodology aims to translate regulatory networks to enrich pathways responsible for disease etiology and 2nd generation sequencing now offers single-molecular resolution of the transcriptome providing unprecedented information on the structural and genetic characteristics of gene expression. Such advances are leading to a future where rich cellular phenotypes will facilitate understanding of the transmission of genetic effect from the gene to organism.

    View details for DOI 10.1093/hmg/ddp400

    View details for Web of Science ID 000271265600012

    View details for PubMedID 19808798

  • Common Regulatory Variation Impacts Gene Expression in a Cell Type-Dependent Manner SCIENCE Dimas, A. S., Deutsch, S., Stranger, B. E., Montgomery, S. B., Borel, C., Attar-Cohen, H., Ingle, C., Beazley, C., Arcelus, M. G., Sekowska, M., Gagnebin, M., Nisbett, J., Deloukas, P., Dermitzakis, E. T., Antonarakis, S. E. 2009; 325 (5945): 1246-1250


    Studies correlating genetic variation to gene expression facilitate the interpretation of common human phenotypes and disease. As functional variants may be operating in a tissue-dependent manner, we performed gene expression profiling and association with genetic variants (single-nucleotide polymorphisms) on three cell types of 75 individuals. We detected cell type-specific genetic effects, with 69 to 80% of regulatory variants operating in a cell type-specific manner, and identified multiple expressive quantitative trait loci (eQTLs) per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene. Cell type-specific eQTLs were found at larger distances from genes and at lower effect size, similar to known enhancers. These data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell-type specificity.

    View details for DOI 10.1126/science.1174148

    View details for Web of Science ID 000269523200038

    View details for PubMedID 19644074

  • Is the thrifty genotype hypothesis supported by evidence based on confirmed type 2 diabetes- and obesity-susceptibility variants? DIABETOLOGIA Southam, L., Soranzo, N., Montgomery, S. B., Frayling, T. M., McCarthy, M. I., Barroso, I., Zeggini, E. 2009; 52 (9): 1846-1851


    According to the thrifty genotype hypothesis, the high prevalence of type 2 diabetes and obesity is a consequence of genetic variants that have undergone positive selection during historical periods of erratic food supply. The recent expansion in the number of validated type 2 diabetes- and obesity-susceptibility loci, coupled with access to empirical data, enables us to look for evidence in support (or otherwise) of the thrifty genotype hypothesis using proven loci.We employed a range of tests to obtain complementary views of the evidence for selection: we determined whether the risk allele at associated 'index' single-nucleotide polymorphisms is derived or ancestral, calculated the integrated haplotype score (iHS) and assessed the population differentiation statistic fixation index (F (ST)) for 17 type 2 diabetes and 13 obesity loci.We found no evidence for significant differences for the derived/ancestral allele test. None of the studied loci showed strong evidence for selection based on the iHS score. We find a high F (ST) for rs7901695 at TCF7L2, the largest type 2 diabetes effect size found to date.Our results provide some evidence for selection at specific loci, but there are no consistent patterns of selection that provide conclusive confirmation of the thrifty genotype hypothesis. Discovery of more signals and more causal variants for type 2 diabetes and obesity is likely to allow more detailed examination of these issues.

    View details for DOI 10.1007/s00125-009-1419-3

    View details for Web of Science ID 000268776100018

    View details for PubMedID 19526209

  • Current computational methods for prioritizing candidate regulatory polymorphisms. Methods in molecular biology (Clifton, N.J.) Montgomery, S. 2009; 569: 89-114


    Discovery of DNA sequence variants responsible for human phenotypic variation is key to advances in molecular diagnostics and medicines. Historically, variants that alter the protein-coding sequence of genes have been targeted when attempting to identify a trait's etiology; this is done because the rules governing these regions are generally well-understood and candidate variants can be easily selected. However, the effects of variants on gene regulation are increasingly regarded as being as important as protein-coding variation in uncovering the nature of phenotypic variation. I discuss resources and methodology that have recently been developed to computationally prioritize variants that may alter gene expression.

    View details for DOI 10.1007/978-1-59745-524-4_5

    View details for PubMedID 19623487

  • ORegAnno: an open-access community-driven resource for regulatory annotation NUCLEIC ACIDS RESEARCH Griffith, O. L., Montgomery, S. B., Bernier, B., Chu, B., Kasaian, K., Aerts, S., Mahony, S., Sleumer, M. C., Bilenky, M., Haeussler, M., Griffith, M., Gallo, S. M., Giardine, B., Hooghe, B., Van Loo, P., Blanco, E., Ticoll, A., Lithwick, S., Portales-Casamar, E., Donaldson, I. J., Robertson, G., Wadelius, C., De Bleser, P., Vlieghe, D., Halfon, M. S., Wasserman, W., Hardison, R., Bergman, C. M., Jones, S. J. 2008; 36: D107-D113


    ORegAnno is an open-source, open-access database and literature curation system for community-based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the 'publication queue' allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54 351 identified by text-mining methods. Users can enter or 'check out' papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at:

    View details for DOI 10.1093/nar/gkm967

    View details for Web of Science ID 000252545400020

    View details for PubMedID 18006570

  • Text-mining assisted regulatory annotation GENOME BIOLOGY Aerts, S., Haeussler, M., Van Vooren, S., Griffith, O. L., Hulpiau, P., Jones, S. J., Montgomery, S. B., Bergman, C. M. 2008; 9 (2)


    Decoding transcriptional regulatory networks and the genomic cis-regulatory logic implemented in their control nodes is a fundamental challenge in genome biology. High-throughput computational and experimental analyses of regulatory networks and sequences rely heavily on positive control data from prior small-scale experiments, but the vast majority of previously discovered regulatory data remains locked in the biomedical literature.We develop text-mining strategies to identify relevant publications and extract sequence information to assist the regulatory annotation process. Using a vector space model to identify Medline abstracts from papers likely to have high cis-regulatory content, we demonstrate that document relevance ranking can assist the curation of transcriptional regulatory networks and estimate that, minimally, 30,000 papers harbor unannotated cis-regulatory data. In addition, we show that DNA sequences can be extracted from primary text with high cis-regulatory content and mapped to genome sequences as a means of identifying the location, organism and target gene information that is critical to the cis-regulatory annotation process.Our results demonstrate that text-mining technologies can be successfully integrated with genome annotation systems, thereby increasing the availability of annotated cis-regulatory data needed to catalyze advances in the field of gene regulation.

    View details for DOI 10.1186/gb-2008-9-2-r31

    View details for Web of Science ID 000254659300013

    View details for PubMedID 18271954

  • Population genomics of human gene expression NATURE GENETICS Stranger, B. E., Nica, A. C., Forrest, M. S., Dimas, A., Bird, C. P., Beazley, C., Ingle, C. E., Dunning, M., Flicek, P., Koller, D., Montgomery, S., Tavare, S., Deloukas, P., Dermitzakis, E. T. 2007; 39 (10): 1217-1224


    Genetic variation influences gene expression, and this variation in gene expression can be efficiently mapped to specific genomic regions and variants. Here we have used gene expression profiling of Epstein-Barr virus-transformed lymphoblastoid cell lines of all 270 individuals genotyped in the HapMap Consortium to elucidate the detailed features of genetic variation underlying gene expression variation. We find that gene expression is heritable and that differentiation between populations is in agreement with earlier small-scale studies. A detailed association analysis of over 2.2 million common SNPs per population (5% frequency in HapMap) with gene expression identified at least 1,348 genes with association signals in cis and at least 180 in trans. Replication in at least one independent population was achieved for 37% of cis signals and 15% of trans signals, respectively. Our results strongly support an abundance of cis-regulatory variation in the human genome. Detection of trans effects is limited but suggests that regulatory variation may be the key primary effect contributing to phenotypic variation in humans. We also explore several methodologies that improve the current state of analysis of gene expression variation.

    View details for DOI 10.1038/ng2142

    View details for Web of Science ID 000249737400017

    View details for PubMedID 17873874

  • A survey of genomic properties for the detection of regulatory polymorphisms PLOS COMPUTATIONAL BIOLOGY Montgomery, S. B., Griffith, O. L., Schuetz, J. M., Brooks-Wilson, A., Jones, S. J. 2007; 3 (6): 1000-1010


    Advances in the computational identification of functional noncoding polymorphisms will aid in cataloging novel determinants of health and identifying genetic variants that explain human evolution. To date, however, the development and evaluation of such techniques has been limited by the availability of known regulatory polymorphisms. We have attempted to address this by assembling, from the literature, a computationally tractable set of regulatory polymorphisms within the ORegAnno database ( We have further used 104 regulatory single-nucleotide polymorphisms from this set and 951 polymorphisms of unknown function, from 2-kb and 152-bp noncoding upstream regions of genes, to investigate the discriminatory potential of 23 properties related to gene regulation and population genetics. Among the most important properties detected in this region are distance to transcription start site, local repetitive content, sequence conservation, minor and derived allele frequencies, and presence of a CpG island. We further used the entire set of properties to evaluate their collective performance in detecting regulatory polymorphisms. Using a 10-fold cross-validation approach, we were able to achieve a sensitivity and specificity of 0.82 and 0.71, respectively, and we show that this performance is strongly influenced by the distance to the transcription start site.

    View details for DOI 10.1371/journal.pcbi.0030106

    View details for Web of Science ID 000249105500010

    View details for PubMedID 17559298

  • ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation BIOINFORMATICS Montgomery, S. B., Griffith, O. L., Sleumer, M. C., Bergman, C. M., Bilenky, M., Pleasance, E. D., Prychyna, Y., Zhang, X., Jones, S. J. 2006; 22 (5): 637-640


    Our understanding of gene regulation is currently limited by our ability to collectively synthesize and catalogue transcriptional regulatory elements stored in scientific literature. Over the past decade, this task has become increasingly challenging as the accrual of biologically validated regulatory sequences has accelerated. To meet this challenge, novel community-based approaches to regulatory element annotation are required.Here, we present the Open Regulatory Annotation (ORegAnno) database as a dynamic collection of literature-curated regulatory regions, transcription factor binding sites and regulatory mutations (polymorphisms and haplotypes). ORegAnno has been designed to manage the submission, indexing and validation of new annotations from users worldwide. Submissions to ORegAnno are immediately cross-referenced to EnsEMBL, dbSNP, Entrez Gene, the NCBI Taxonomy database and PubMed, where appropriate.ORegAnno is available directly through MySQL, Web services, and online at All software is licensed under the Lesser GNU Public License (LGPL).

    View details for DOI 10.1093/bioinformatics/btk027

    View details for Web of Science ID 000235604400024

    View details for PubMedID 16397004

  • cisRED: a database system for genome-scale computational discovery of regulatory elements NUCLEIC ACIDS RESEARCH Robertson, G., Bilenky, M., Lin, K., He, A., Yuen, W., Dagpinar, M., Varhol, R., Teague, K., Griffith, O. L., Zhang, X., Pan, Y., Hassel, M., Sleumer, M. C., Pan, W., Pleasance, E. D., Chuang, M., Hao, H., Li, Y. Y., Robertson, N., Fjell, C., Li, B., Montgomery, S. B., Astakhova, T., Zhou, J., Sander, J., Siddiqui, A. S., Jones, S. J. 2006; 34: D68-D73


    We describe cisRED, a database for conserved regulatory elements that are identified and ranked by a genome-scale computational system ( The database and high-throughput predictive pipeline are designed to address diverse target genomes in the context of rapidly evolving data resources and tools. Motifs are predicted in promoter regions using multiple discovery methods applied to sequence sets that include corresponding sequence regions from vertebrates. We estimate motif significance by applying discovery and post-processing methods to randomized sequence sets that are adaptively derived from target sequence sets, retain motifs with p-values below a threshold and identify groups of similar motifs and co-occurring motif patterns. The database offers information on atomic motifs, motif groups and patterns. It is web-accessible, and can be queried directly, downloaded or installed locally.

    View details for DOI 10.1093/nar/gkj075

    View details for Web of Science ID 000239307700015

    View details for PubMedID 16381958

  • An application of peer-to-peer technology to the discovery, use and assessment of bioinformatics programs NATURE METHODS Montgomery, S. B., Fu, T., Guan, J., Lin, K., Jones, S. J. 2005; 2 (8): 563-563

    View details for Web of Science ID 000230884500002

    View details for PubMedID 16094378

  • Sockeye: A 3D environment for comparative genomics GENOME RESEARCH Montgomery, S. B., Astakhova, T., Bilenky, M., Birney, E., Fu, T., Hassel, M., Melsopp, C., Rak, M., Robertson, A. G., Sleumer, M., Siddiqui, A. S., Jones, S. J. 2004; 14 (5): 956-962


    Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization.

    View details for DOI 10.1101/gr.1890304

    View details for Web of Science ID 000221171700022

    View details for PubMedID 15123592

  • The genome sequence of the SARS-associated coronavirus SCIENCE Marra, M. A., Jones, S. J., Astell, C. R., Holt, R. A., Brooks-Wilson, A., Butterfield, Y. S., Khattra, J., Asano, J. K., Barber, S. A., Chan, S. Y., Cloutier, A., Coughlin, S. M., Freeman, D., Girn, N., Griffith, O. L., Leach, S. R., Mayo, M., MCDONALD, H., Montgomery, S. B., Pandoh, P. K., Petrescu, A. S., Robertson, A. G., Schein, J. E., Siddiqui, A., Smailus, D. E., Stott, J. E., Yang, G. S., Plummer, F., Andonov, A., Artsob, H., Bastien, N., Bernard, K., Booth, T. F., Bowness, D., Czub, M., Drebot, M., Fernando, L., Flick, R., Garbutt, M., Gray, M., Grolla, A., Jones, S., Feldmann, H., Meyers, A., Kabani, A., Li, Y., Normand, S., Stroher, U., Tipples, G. A., Tyler, S., Vogrig, R., Ward, D., Watson, B., BRUNHAM, R. C., Krajden, M., Petric, M., Skowronski, D. M., Upton, C., Roper, R. L. 2003; 300 (5624): 1399-1404


    We sequenced the 29,751-base genome of the severe acute respiratory syndrome (SARS)-associated coronavirus known as the Tor2 isolate. The genome sequence reveals that this coronavirus is only moderately related to other known coronaviruses, including two human coronaviruses, HCoV-OC43 and HCoV-229E. Phylogenetic analysis of the predicted viral proteins indicates that the virus does not closely resemble any of the three previously known groups of coronaviruses. The genome sequence will aid in the diagnosis of SARS virus infection in humans and potential animal hosts (using polymerase chain reaction and immunological tests), in the development of antivirals (including neutralizing antibodies), and in the identification of putative epitopes for vaccine development.

    View details for DOI 10.1126/science.1085953

    View details for Web of Science ID 000183181800036

    View details for PubMedID 12730501