Professional Education


  • Doctor of Philosophy, Stanford University, BIOM-PHD (2018)
  • Master of Science, Stanford University, BIOM-MS (2018)

All Publications


  • Discovery of common and rare genetic risk variants for colorectal cancer NATURE GENETICS Huyghe, J. R., Bien, S. A., Harrison, T. A., Kang, H., Chen, S., Schmit, S. L., Conti, D. V., Qu, C., Jeon, J., Edlund, C. K., Greenside, P., Wainberg, M., Schumacher, F. R., Smith, J. D., Levine, D. M., Nelson, S. C., Sinnott-Armstrong, N. A., Albanes, D., Alonso, M., Anderson, K., Arnau-Collell, C., Arndt, V., Bamia, C., Banbury, B. L., Baron, J. A., Berndt, S. I., Bezieau, S., Bishop, D., Boehm, J., Boeing, H., Brenner, H., Brezina, S., Buch, S., Buchanan, D. D., Burnett-Hartman, A., Butterbach, K., Caan, B. J., Campbell, P. T., Carlson, C. S., Castellvi-Bel, S., Chan, A. T., Chang-Claude, J., Chanock, S. J., Chirlaque, M., Cho, S., Connolly, C. M., Cross, A. J., Cuk, K., Curtis, K. R., de la Chapelle, A., Doheny, K. F., Duggan, D., Easton, D. F., Elias, S. G., Elliott, F., English, D. R., Feskens, E. M., Figueiredo, J. C., Fischer, R., FitzGerald, L. M., Forman, D., Gala, M., Gallinger, S., Gauderman, W., Giles, G. G., Gillanders, E., Gong, J., Goodman, P. J., Grady, W. M., Grove, J. S., Gsur, A., Gunter, M. J., Haile, R. W., Hampe, J., Hampel, H., Harlid, S., Hayes, R. B., Hofer, P., Hoffmeister, M., Hopper, J. L., Hsu, W., Huang, W., Hudson, T. J., Hunter, D. J., Ibanez-Sanz, G., Idos, G. E., Ingersoll, R., Jackson, R. D., Jacobs, E. J., Jenkins, M. A., Joshi, A. D., Joshu, C. E., Keku, T. O., Key, T. J., Kim, H., Kobayashi, E., Kolonel, L. N., Kooperberg, C., Kuehn, T., Kury, S., Kweon, S., Larsson, S. C., Laurie, C. A., Le Marchand, L., Leal, S. M., Lee, S., Lejbkowicz, F., Lemire, M., Li, C. I., Li, L., Lieb, W., Lin, Y., Lindblom, A., Lindor, N. M., Ling, H., Louie, T. L., Mannisto, S., Markowitz, S. D., Martin, V., Masala, G., McNeil, C. E., Melas, M., Milne, R. L., Moreno, L., Murphy, N., Myte, R., Naccarati, A., Newcomb, P. A., Offit, K., Ogino, S., Onland-Moret, N., Pardini, B., Parfrey, P. S., Pearlman, R., Perduca, V., Pharoah, P. P., Pinchev, M., Platz, E. A., Prentice, R. L., Pugh, E., Raskin, L., Rennert, G., Rennert, H. S., Riboli, E., Rodriguez-Barranco, M., Romm, J., Sakoda, L. C., Schafmayer, C., Schoen, R. E., Seminara, D., Shah, M., Shelford, T., Shin, M., Shulman, K., Sieri, S., Slattery, M. L., Southey, M. C., Stadler, Z. K., Stegmaier, C., Su, Y., Tangen, C. M., Thibodeau, S. N., Thomas, D. C., Thomas, S. S., Toland, A. E., Trichopoulou, A., Ulrich, C. M., Van den Berg, D. J., van Duijnhoven, F. B., Van Guelpen, B., van Kranen, H., Vijai, J., Visvanathan, K., Vodicka, P., Vodickova, L., Vymetalkova, V., Weigl, K., Weinstein, S. J., White, E., Win, A., Wolf, C., Wolk, A., Woods, M. O., Wu, A. H., Zaidi, S. H., Zanke, B. W., Zhang, Q., Zheng, W., Scacheri, P. C., Potter, J. D., Bassik, M. C., Kundaje, A., Casey, G., Moreno, V., Abecasis, G. R., Nickerson, D. A., Gruber, S. B., Hsu, L., Peters, U. 2019; 51 (1): 76-+
  • CrowdVariant: a crowdsourcing approach to classify copy number variants. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Greenside, P., Zook, J., Salit, M., Cule, M., Poplin, R., DePristo, M. 2019; 24: 224–35

    Abstract

    Copy number variants (CNVs) are an important type of genetic variation that play a causal role in many diseases. The ability to identify high quality CNVs is of substantial clinical relevance. However, CNVs are notoriously difficult to identify accurately from array-based methods and next-generation sequencing (NGS) data, particularly for small (< 10kbp) CNVs. Manual curation by experts widely remains the gold standard but cannot scale with the pace of sequencing, particularly in fast-growing clinical applications. We present the first proof-of-principle study demonstrating high throughput manual curation of putative CNVs by non-experts. We developed a crowdsourcing framework, called CrowdVariant, that leverages Google's high-throughput crowdsourcing platform to create a high confidence set of deletions for NA24385 (NIST HG002/RM 8391), an Ashkenazim reference sample developed in partnership with the Genome In A Bottle (GIAB) Consortium. We show that non-experts tend to agree both with each other and with experts on putative CNVs. We show that crowdsourced non-expert classifications can be used to accurately assign copy number status to putative CNV calls and identify 1,781 high confidence deletions in a reference sample. Multiple lines of evidence suggest these calls are a substantial improvement over existing CNV callsets and can also be useful in benchmarking and improving CNV calling algorithms. Our crowdsourcing methodology takes the first step toward showing the clinical potential for manual curation of CNVs at scale and can further guide other crowdsourcing genomics applications.

    View details for PubMedID 30864325

  • Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays. PloS one Movva, R., Greenside, P., Marinov, G. K., Nair, S., Shrikumar, A., Kundaje, A. 2019; 14 (6): e0218073

    Abstract

    The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ∼500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.

    View details for DOI 10.1371/journal.pone.0218073

    View details for PubMedID 31206543

  • Discovery of common and rare genetic risk variants for colorectal cancer. Nature genetics Huyghe, J. R., Bien, S. A., Harrison, T. A., Kang, H. M., Chen, S., Schmit, S. L., Conti, D. V., Qu, C., Jeon, J., Edlund, C. K., Greenside, P., Wainberg, M., Schumacher, F. R., Smith, J. D., Levine, D. M., Nelson, S. C., Sinnott-Armstrong, N. A., Albanes, D., Alonso, M. H., Anderson, K., Arnau-Collell, C., Arndt, V., Bamia, C., Banbury, B. L., Baron, J. A., Berndt, S. I., Bezieau, S., Bishop, D. T., Boehm, J., Boeing, H., Brenner, H., Brezina, S., Buch, S., Buchanan, D. D., Burnett-Hartman, A., Butterbach, K., Caan, B. J., Campbell, P. T., Carlson, C. S., Castellvi-Bel, S., Chan, A. T., Chang-Claude, J., Chanock, S. J., Chirlaque, M., Cho, S. H., Connolly, C. M., Cross, A. J., Cuk, K., Curtis, K. R., de la Chapelle, A., Doheny, K. F., Duggan, D., Easton, D. F., Elias, S. G., Elliott, F., English, D. R., Feskens, E. J., Figueiredo, J. C., Fischer, R., FitzGerald, L. M., Forman, D., Gala, M., Gallinger, S., Gauderman, W. J., Giles, G. G., Gillanders, E., Gong, J., Goodman, P. J., Grady, W. M., Grove, J. S., Gsur, A., Gunter, M. J., Haile, R. W., Hampe, J., Hampel, H., Harlid, S., Hayes, R. B., Hofer, P., Hoffmeister, M., Hopper, J. L., Hsu, W., Huang, W., Hudson, T. J., Hunter, D. J., Ibanez-Sanz, G., Idos, G. E., Ingersoll, R., Jackson, R. D., Jacobs, E. J., Jenkins, M. A., Joshi, A. D., Joshu, C. E., Keku, T. O., Key, T. J., Kim, H. R., Kobayashi, E., Kolonel, L. N., Kooperberg, C., Kuhn, T., Kury, S., Kweon, S., Larsson, S. C., Laurie, C. A., Le Marchand, L., Leal, S. M., Lee, S. C., Lejbkowicz, F., Lemire, M., Li, C. I., Li, L., Lieb, W., Lin, Y., Lindblom, A., Lindor, N. M., Ling, H., Louie, T. L., Mannisto, S., Markowitz, S. D., Martin, V., Masala, G., McNeil, C. E., Melas, M., Milne, R. L., Moreno, L., Murphy, N., Myte, R., Naccarati, A., Newcomb, P. A., Offit, K., Ogino, S., Onland-Moret, N. C., Pardini, B., Parfrey, P. S., Pearlman, R., Perduca, V., Pharoah, P. D., Pinchev, M., Platz, E. A., Prentice, R. L., Pugh, E., Raskin, L., Rennert, G., Rennert, H. S., Riboli, E., Rodriguez-Barranco, M., Romm, J., Sakoda, L. C., Schafmayer, C., Schoen, R. E., Seminara, D., Shah, M., Shelford, T., Shin, M., Shulman, K., Sieri, S., Slattery, M. L., Southey, M. C., Stadler, Z. K., Stegmaier, C., Su, Y., Tangen, C. M., Thibodeau, S. N., Thomas, D. C., Thomas, S. S., Toland, A. E., Trichopoulou, A., Ulrich, C. M., Van Den Berg, D. J., van Duijnhoven, F. J., Van Guelpen, B., van Kranen, H., Vijai, J., Visvanathan, K., Vodicka, P., Vodickova, L., Vymetalkova, V., Weigl, K., Weinstein, S. J., White, E., Win, A. K., Wolf, C. R., Wolk, A., Woods, M. O., Wu, A. H., Zaidi, S. H., Zanke, B. W., Zhang, Q., Zheng, W., Scacheri, P. C., Potter, J. D., Bassik, M. C., Kundaje, A., Casey, G., Moreno, V., Abecasis, G. R., Nickerson, D. A., Gruber, S. B., Hsu, L., Peters, U. 2018

    Abstract

    To further dissect the genetic architecture of colorectal cancer (CRC), we performed whole-genome sequencing of 1,439 cases and 720 controls, imputed discovered sequence variants and Haplotype Reference Consortium panel variants into genome-wide association study data, and tested for association in 34,869 cases and 29,051 controls. Findings were followed up in an additional 23,262 cases and 38,296 controls. We discovered a strongly protective 0.3% frequency variant signal at CHD1. In a combined meta-analysis of 125,478 individuals, we identified 40 new independent signals at P<5*10-8, bringing the number of known independent signals for CRC to ~100. New signals implicate lower-frequency variants, Kruppel-like factors, Hedgehog signaling, Hippo-YAP signaling, long noncoding RNAs and somatic drivers, and support a role for immune function. Heritability analyses suggest that CRC risk is highly polygenic, and larger, more comprehensive studies enabling rare variant analysis will improve understanding of biology underlying this risk and influence personalized screening strategies and drug development.

    View details for PubMedID 30510241

  • Intertumoral Heterogeneity in SCLC Is Influenced by the Cell Type of Origin. Cancer discovery Yang, D., Denny, S. K., Greenside, P. G., Chaikovsky, A. C., Brady, J. J., Ouadah, Y., Granja, J. M., Jahchan, N. S., Lim, J. S., Kwok, S., Kong, C. S., Berghoff, A. S., Schmitt, A., Reinhardt, H. C., Park, K., Preusser, M., Kundaje, A., Greenleaf, W. J., Sage, J., Winslow, M. M. 2018

    Abstract

    The extent to which early events shape tumor evolution is largely uncharacterized, even though a better understanding of these early events may help identify key vulnerabilities in advanced tumors. Here, using genetically defined mouse models of small cell lung cancer (SCLC), we uncovered distinct metastatic programs attributable to the cell type of origin. In one model, tumors gain metastatic ability through amplification of the transcription factor NFIB and a widespread increase in chromatin accessibility, whereas in the other model, tumors become metastatic in the absence of NFIB-driven chromatin alterations. Gene-expression and chromatin accessibility analyses identify distinct mechanisms as well as markers predictive of metastatic progression in both groups. Underlying the difference between the two programs was the cell type of origin of the tumors, with NFIB-independent metastases arising from mature neuroendocrine cells. Our findings underscore the importance of the identity of cell type of origin in influencing tumor evolution and metastatic mechanisms.SIGNIFICANCE: We show that SCLC can arise from different cell types of origin, which profoundly influences the eventual genetic and epigenetic changes that enable metastatic progression. Understanding intertumoral heterogeneity in SCLC, and across cancer types, may illuminate mechanisms of tumor progression and uncover how the cell type of origin affects tumor evolution. Cancer Discov; 8(10); 1-16. ©2018 AACR.See related commentary by Pozo et al., p. 1216.

    View details for PubMedID 30228179

  • Discovering epistatic feature interactions from neural network models of regulatory DNA sequences Greenside, P., Shimko, T., Fordyce, P., Kundaje, A. OXFORD UNIV PRESS. 2018: 629–37
  • Prediction of protein-ligand interactions from paired protein sequence motifs and ligand substructures. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Greenside, P., Hillenmeyer, M., Kundaje, A. 2018; 23: 20–31

    Abstract

    Identification of small molecule ligands that bind to proteins is a critical step in drug discovery. Computational methods have been developed to accelerate the prediction of protein-ligand binding, but often depend on 3D protein structures. As only a limited number of protein 3D structures have been resolved, the ability to predict protein-ligand interactions without relying on a 3D representation would be highly valuable. We use an interpretable confidence-rated boosting algorithm to predict protein-ligand interactions with high accuracy from ligand chemical substructures and protein 1D sequence motifs, without relying on 3D protein structures. We compare several protein motif definitions, assess generalization of our model's predictions to unseen proteins and ligands, demonstrate recovery of well established interactions and identify globally predictive protein-ligand motif pairs. By bridging biological and chemical perspectives, we demonstrate that it is possible to predict protein-ligand interactions using only motif-based features and that interpretation of these features can reveal new insights into the molecular mechanics underlying each interaction. Our work also lays a foundation to explore more predictive feature sets and sophisticated machine learning approaches as well as other applications, such as predicting unintended interactions or the effects of mutations.

    View details for PubMedID 29218866

  • Discovering epistatic feature interactions from neural network models of regulatory DNA sequences. Bioinformatics (Oxford, England) Greenside, P., Shimko, T., Fordyce, P., Kundaje, A. 2018; 34 (17): i629–i637

    Abstract

    Transcription factors bind regulatory DNA sequences in a combinatorial manner to modulate gene expression. Deep neural networks (DNNs) can learn the cis-regulatory grammars encoded in regulatory DNA sequences associated with transcription factor binding and chromatin accessibility. Several feature attribution methods have been developed for estimating the predictive importance of individual features (nucleotides or motifs) in any input DNA sequence to its associated output prediction from a DNN model. However, these methods do not reveal higher-order feature interactions encoded by the models.We present a new method called Deep Feature Interaction Maps (DFIM) to efficiently estimate interactions between all pairs of features in any input DNA sequence. DFIM accurately identifies ground truth motif interactions embedded in simulated regulatory DNA sequences. DFIM identifies synergistic interactions between GATA1 and TAL1 motifs from in vivo TF binding models. DFIM reveals epistatic interactions involving nucleotides flanking the core motif of the Cbf1 TF in yeast from in vitro TF binding models. We also apply DFIM to regulatory sequence models of in vivo chromatin accessibility to reveal interactions between regulatory genetic variants and proximal motifs of target TFs as validated by TF binding quantitative trait loci. Our approach makes significant strides in improving the interpretability of deep learning models for genomics.Code is available at: https://github.com/kundajelab/dfim.Supplementary data are available at Bioinformatics online.

    View details for PubMedID 30423062

  • Impact of regulatory variation across human iPSCs and differentiated cells GENOME RESEARCH Banovich, N. E., Li, Y. I., Raj, A., Ward, M. C., Greenside, P., Calderon, D., Tung, P., Burnett, J. E., Myrthil, M., Thomas, S. M., Burrows, C. K., Romero, I., Pavlovic, B. J., Kundaje, A., Pritchard, J. K., Gilad, Y. 2018; 28 (1): 122–31

    Abstract

    Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression levels, chromatin accessibility, and DNA methylation. Our analysis focused on a comparison of inter-individual regulatory variation across cell types. While most cell-type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell-type-specific regulatory QTLs are in shared open chromatin. This observation motivated us to develop a deep neural network to predict open chromatin regions from DNA sequence alone. Using this approach, we were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell-type-specific chromatin accessibility.

    View details for PubMedID 29208628

  • A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles CELL Subramanian, A., Narayan, R., Corsello, S. M., Peck, D. D., Natoli, T. E., Lu, X., Gould, J., Davis, J. F., Tubelli, A. A., Asiedu, J. K., Lahr, D. L., Hirschman, J. E., Liu, Z., Donahue, M., Julian, B., Khan, M., Wadden, D., Smith, I. C., Lam, D., Liberzon, A., Toder, C., Bagul, M., Orzechowski, M., Enache, O. M., Piccioni, F., Johnson, S. A., Lyons, N. J., Berger, A. H., Shamji, A. F., Brooks, A. N., Vrcic, A., Flynn, C., Rosains, J., Takeda, D. Y., Hu, R., Davison, D., Lamb, J., Ardlie, K., Hogstrom, L., Greenside, P., Gray, N. S., Clemons, P. A., Silver, S., Wu, X., Zhao, W., Read-Button, W., Wu, X., Haggarty, S. J., Ronco, L. V., Boehm, J. S., Schreiber, S. L., Doench, J. G., Bittker, J. A., Root, D. E., Wong, B., Golub, T. R. 2017; 171 (6): 1437-+

    Abstract

    We previously piloted the concept of a Connectivity Map (CMap), whereby genes, drugs, and disease states are connected by virtue of common gene-expression signatures. Here, we report more than a 1,000-fold scale-up of the CMap as part of the NIH LINCS Consortium, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that we term L1000. We show that L1000 is highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts. We further show that the expanded CMap can be used to discover mechanism of action of small molecules, functionally annotate genetic variants of disease genes, and inform clinical trials. The 1.3 million L1000 profiles described here, as well as tools for their analysis, are available at https://clue.io.

    View details for DOI 10.1016/j.cell.2017.10.049

    View details for Web of Science ID 000417362700023

    View details for PubMedID 29195078

  • Enrichment of colorectal cancer associations in functional regions: Insight for using epigenomics data in the analysis of whole genome sequence-imputed GWAS data PLOS ONE Bien, S. A., Auer, P. L., Harrison, T. A., Qu, C., Connolly, C. M., Greenside, P. G., Chen, S., Berndt, S. I., Bezieau, S., Kang, H. M., Huyghe, J., Brenner, H., Casey, G., Chan, A. T., Hopper, J. L., Banbury, B. L., Chang-Claude, J., Chanock, S. J., Haile, R. W., Hoffmeister, M., Fuchsberger, C., Jenkins, M. A., Leal, S. M., Lemire, M., Newcomb, P. A., Gallinger, S., Potter, J. D., Schoen, R. E., Slattery, M. L., Smith, J. D., Le Marchand, L., White, E., Zanke, B. W., Abecasis, G. R., Carlson, C. S., Peters, U., Nickerson, D. A., Kundaje, A., Hsu, L., GECCO CCFR 2017; 12 (11): e0186518

    Abstract

    The evaluation of less frequent genetic variants and their effect on complex disease pose new challenges for genomic research. To investigate whether epigenetic data can be used to inform aggregate rare-variant association methods (RVAM), we assessed whether variants more significantly associated with colorectal cancer (CRC) were preferentially located in non-coding regulatory regions, and whether enrichment was specific to colorectal tissues.Active regulatory elements (ARE) were mapped using data from 127 tissues and cell-types from NIH Roadmap Epigenomics and Encyclopedia of DNA Elements (ENCODE) projects. We investigated whether CRC association p-values were more significant for common variants inside versus outside AREs, or 2) inside colorectal (CR) AREs versus AREs of other tissues and cell-types. We employed an integrative epigenomic RVAM for variants with allele frequency <1%. Gene sets were defined as ARE variants within 200 kilobases of a transcription start site (TSS) using either CR ARE or ARE from non-digestive tissues. CRC-set association p-values were used to evaluate enrichment of less frequent variant associations in CR ARE versus non-digestive ARE.ARE from 126/127 tissues and cell-types were significantly enriched for stronger CRC-variant associations. Strongest enrichment was observed for digestive tissues and immune cell types. CR-specific ARE were also enriched for stronger CRC-variant associations compared to ARE combined across non-digestive tissues (p-value = 9.6 × 10-4). Additionally, we found enrichment of stronger CRC association p-values for rare variant sets of CR ARE compared to non-digestive ARE (p-value = 0.029).Integrative epigenomic RVAM may enable discovery of less frequent variants associated with CRC, and ARE of digestive and immune tissues are most informative. Although distance-based aggregation of less frequent variants in CR ARE surrounding TSS showed modest enrichment, future association studies would likely benefit from joint analysis of transcriptomes and epigenomes to better link regulatory variation with target genes.

    View details for PubMedID 29161273

  • Molecular definition of a metastatic lung cancer state reveals a targetable CD109-Janus kinase-Stat axis. Nature medicine Chuang, C., Greenside, P. G., Rogers, Z. N., Brady, J. J., Yang, D., Ma, R. K., Caswell, D. R., Chiou, S., Winters, A. F., Grüner, B. M., Ramaswami, G., Spencley, A. L., Kopecky, K. E., Sayles, L. C., Sweet-Cordero, E. A., Li, J. B., Kundaje, A., Winslow, M. M. 2017; 23 (3): 291-300

    Abstract

    Lung cancer is the leading cause of cancer deaths worldwide, with the majority of mortality resulting from metastatic spread. However, the molecular mechanism by which cancer cells acquire the ability to disseminate from primary tumors, seed distant organs, and grow into tissue-destructive metastases remains incompletely understood. We combined tumor barcoding in a mouse model of human lung adenocarcinoma with unbiased genomic approaches to identify a transcriptional program that confers metastatic ability and predicts patient survival. Small-scale in vivo screening identified several genes, including Cd109, that encode novel pro-metastatic factors. We uncovered signaling mediated by Janus kinases (Jaks) and the transcription factor Stat3 as a critical, pharmacologically targetable effector of CD109-driven lung cancer metastasis. In summary, by coupling the systematic genomic analysis of purified cancer cells in distinct malignant states from mouse models with extensive human validation, we uncovered several key regulators of metastatic ability, including an actionable pro-metastatic CD109-Jak-Stat3 axis.

    View details for DOI 10.1038/nm.4285

    View details for PubMedID 28191885

  • Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nature genetics Mumbach, M. R., Satpathy, A. T., Boyle, E. A., Dai, C., Gowen, B. G., Cho, S. W., Nguyen, M. L., Rubin, A. J., Granja, J. M., Kazane, K. R., Wei, Y., Nguyen, T., Greenside, P. G., Corces, M. R., Tycko, J., Simeonov, D. R., Suliman, N., Li, R., Xu, J., Flynn, R. A., Kundaje, A., Khavari, P. A., Marson, A., Corn, J. E., Quertermous, T., Greenleaf, W. J., Chang, H. Y. 2017

    Abstract

    The challenge of linking intergenic mutations to target genes has limited molecular understanding of human diseases. Here we show that H3K27ac HiChIP generates high-resolution contact maps of active enhancers and target genes in rare primary human T cell subtypes and coronary artery smooth muscle cells. Differentiation of naive T cells into T helper 17 cells or regulatory T cells creates subtype-specific enhancer-promoter interactions, specifically at regions of shared DNA accessibility. These data provide a principled means of assigning molecular functions to autoimmune and cardiovascular disease risk variants, linking hundreds of noncoding variants to putative gene targets. Target genes identified with HiChIP are further supported by CRISPR interference and activation at linked enhancers, by the presence of expression quantitative trait loci, and by allele-specific enhancer loops in patient-derived primary cells. The majority of disease-associated enhancers contact genes beyond the nearest gene in the linear genome, leading to a fourfold increase in the number of potential target genes for autoimmune and cardiovascular diseases.

    View details for PubMedID 28945252

  • An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nature methods Corces, M. R., Trevino, A. E., Hamilton, E. G., Greenside, P. G., Sinnott-Armstrong, N. A., Vesuna, S., Satpathy, A. T., Rubin, A. J., Montine, K. S., Wu, B., Kathiria, A., Cho, S. W., Mumbach, M. R., Carter, A. C., Kasowski, M., Orloff, L. A., Risca, V. I., Kundaje, A., Khavari, P. A., Montine, T. J., Greenleaf, W. J., Chang, H. Y. 2017

    Abstract

    We present Omni-ATAC, an improved ATAC-seq protocol for chromatin accessibility profiling that works across multiple applications with substantial improvement of signal-to-background ratio and information content. The Omni-ATAC protocol generates chromatin accessibility profiles from archival frozen tissue samples and 50-μm sections, revealing the activities of disease-associated DNA elements in distinct human brain structures. The Omni-ATAC protocol enables the interrogation of personal regulomes in tissue context and translational studies.

    View details for PubMedID 28846090

  • Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nature genetics Corces, M. R., Buenrostro, J. D., Wu, B., Greenside, P. G., Chan, S. M., Koenig, J. L., Snyder, M. P., Pritchard, J. K., Kundaje, A., Greenleaf, W. J., Majeti, R., Chang, H. Y. 2016; 48 (10): 1193-1203

    Abstract

    We define the chromatin accessibility and transcriptional landscapes in 13 human primary blood cell types that span the hematopoietic hierarchy. Exploiting the finding that the enhancer landscape better reflects cell identity than mRNA levels, we enable 'enhancer cytometry' for enumeration of pure cell types from complex populations. We identify regulators governing hematopoietic differentiation and further show the lineage ontogeny of genetic elements linked to diverse human diseases. In acute myeloid leukemia (AML), chromatin accessibility uncovers unique regulatory evolution in cancer cells with a progressively increasing mutation burden. Single AML cells exhibit distinctive mixed regulome profiles corresponding to disparate developmental stages. A method to account for this regulatory heterogeneity identified cancer-specific deviations and implicated HOX factors as key regulators of preleukemic hematopoietic stem cell characteristics. Thus, regulome dynamics can provide diverse insights into hematopoietic development and disease.

    View details for DOI 10.1038/ng.3646

    View details for PubMedID 27526324

  • An Arntl2-Driven Secretome Enables Lung Adenocarcinoma Metastatic Self-Sufficiency CANCER CELL Brady, J. J., Chuang, C., Greenside, P. G., Rogers, Z. N., Murray, C. W., Caswell, D. R., Hartmann, U., Connolly, A. J., Sweet-Cordero, E. A., Kundaje, A., Winslow, M. M. 2016; 29 (5): 697-710

    Abstract

    The ability of cancer cells to establish lethal metastatic lesions requires the survival and expansion of single cancer cells at distant sites. The factors controlling the clonal growth ability of individual cancer cells remain poorly understood. Here, we show that high expression of the transcription factor ARNTL2 predicts poor lung adenocarcinoma patient outcome. Arntl2 is required for metastatic ability in vivo and clonal growth in cell culture. Arntl2 drives metastatic self-sufficiency by orchestrating the expression of a complex pro-metastatic secretome. We identify Clock as an Arntl2 partner and functionally validate the matricellular protein Smoc2 as a pro-metastatic secreted factor. These findings shed light on the molecular mechanisms that enable single cancer cells to form allochthonous tumors in foreign tissue environments.

    View details for DOI 10.1016/j.ccell.2016.03.003

    View details for PubMedID 27150038

  • Relating Chemical Structure to Cellular Response: An Integrative Analysis of Gene Expression, Bioactivity, and Structural Data Across 11,000 Compounds. CPT: pharmacometrics & systems pharmacology Chen, B., Greenside, P., Paik, H., Sirota, M., Hadley, D., Butte, A. J. 2015; 4 (10): 576-584

    Abstract

    A central premise in systems pharmacology is that structurally similar compounds have similar cellular responses; however, this principle often does not hold. One of the most widely used measures of cellular response is gene expression. By integrating gene expression data from Library of Integrated Network-based Cellular Signatures (LINCS) with chemical structure and bioactivity data from PubChem, we performed a large-scale correlation analysis of chemical structures and gene expression profiles of over 11,000 compounds taking into account confounding factors such as biological conditions (e.g., cell line, dose) and bioactivities. We found that structurally similar compounds do indeed yield similar gene expression profiles. There is an ∼20% chance that two structurally similar compounds (Tanimoto Coefficient ≥ 0.85) share significantly similar gene expression profiles. Regardless of structural similarity, two compounds tend to share similar gene expression profiles in a cell line when they are administrated at a higher dose or when the cell line is sensitive to both compounds.

    View details for DOI 10.1002/psp4.12009

    View details for PubMedID 26535158

    View details for PubMedCentralID PMC4625862

  • Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions. Cell Grubert, F., Zaugg, J. B., Kasowski, M., Ursu, O., Spacek, D. V., Martin, A. R., Greenside, P., Srivas, R., Phanstiel, D. H., Pekowska, A., Heidari, N., Euskirchen, G., Huber, W., Pritchard, J. K., Bustamante, C. D., Steinmetz, L. M., Kundaje, A., Snyder, M. 2015; 162 (5): 1051-1065

    Abstract

    Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements.

    View details for DOI 10.1016/j.cell.2015.07.048

    View details for PubMedID 26300125

  • Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions CELL Grubert, F., Zaugg, J. B., Kasowski, M., Ursu, O., Spacek, D. V., Martin, A. R., Greenside, P., Srivas, R., Phanstiel, D. H., Pekowska, A., Heidari, N., Euskirchen, G., Huber, W., Pritchard, J. K., Bustamante, C. D., Steinmetz, L. M., Kundaje, A., Snyder, M. 2015; 162 (5): 1051-1065

    Abstract

    Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements.

    View details for DOI 10.1016/j.cell.2015.07.048

    View details for Web of Science ID 000360589900015

    View details for PubMedCentralID PMC4556133