Honors & Awards
Baxter Faculty Scholar Award, Donald E. and Delia B. Baxter Foundation (2020)
Interdisciplinary Initiatives Seed Grants Program Award, Stanford Bio-X (2018 - 2020)
Stem Cell Research Award, Stinehart/Reed Foundation (2018 - 2020)
K99/R00 Pathway to Independence Award, NIH/NCI (2015 - 2020)
Visionary Postdoctoral Fellowship, Dept. of Defense (2012 - 2015)
NIH T32 Cancer Biology Training Grant, Stanford University (2012)
Siebel Fellow, The Siebel Stem Cell Institute (Stanford/UC Berkeley) (2011-2015)
Graduate Dissertation Fellowship Award, UCSB (2009)
Boards, Advisory Committees, Professional Organizations
Member, International Society for Computational Biology (ISCB) (2008 - Present)
Associate Member, American Association for Cancer Research (AACR) (2012 - Present)
PhD, University of California, Santa Barbara, Biomolecular Science and Engineering Program (2010)
Current Research and Scholarly Interests
Our group combines computational and experimental techniques to study the cellular organization of complex tissues, with a focus on determining the phenotypic diversity and clinical significance of tumor cell subsets. We have a particular interest in developing innovative data science tools that illuminate the cellular hierarchies and stromal elements that underlie tumor initiation, progression, and response to therapy. As part of this focus, we develop new algorithms to resolve cellular states and multicellular communities, tumor developmental hierarchies, and single-cell spatial relationships from genomic profiles of clinical biospecimens. Key results are further explored experimentally, both in our lab and through collaboration, with the goal of translating promising findings into the clinic.
As a member of the Department of Biomedical Data Science and the Institute for Stem Cell Biology and Regenerative Medicine, and as an affiliate of graduate programs in Biomedical Informatics, Cancer Biology, and Immunology, we are also interested in the development of impactful biomedical data science tools in areas beyond our immediate research focus, including developmental biology, regenerative medicine, and systems immunology.
- Bioinformatics for Stem Cell and Cancer Biology
STEMREM 205 (Win)
- Workshop in Biostatistics
BIODS 260B, STATS 260B (Win)
Independent Studies (4)
- Graduate Research
IMMUNOL 399 (Aut, Win, Spr, Sum)
- Graduate Research
STEMREM 399 (Spr, Sum)
- Out-of-Department Advanced Research Laboratory in Experimental Biology
BIO 199X (Spr)
- Undergraduate Research
STEMREM 199 (Spr, Sum)
- Graduate Research
Graduate and Fellowship Programs
Biomedical Informatics (Phd Program)
Single-cell transcriptional diversity is a hallmark of developmental potential.
Science (New York, N.Y.)
2020; 367 (6476): 405–11
Single-cell RNA sequencing (scRNA-seq) is a powerful approach for reconstructing cellular differentiation trajectories. However, inferring both the state and direction of differentiation is challenging. Here, we demonstrate a simple, yet robust, determinant of developmental potential-the number of expressed genes per cell-and leverage this measure of transcriptional diversity to develop a computational framework (CytoTRACE) for predicting differentiation states from scRNA-seq data. When applied to diverse tissue types and organisms, CytoTRACE outperformed previous methods and nearly 19,000 annotated gene sets for resolving 52 experimentally determined developmental trajectories. Additionally, it facilitated the identification of quiescent stem cells and revealed genes that contribute to breast tumorigenesis. This study thus establishes a key RNA-based feature of developmental potential and a platform for delineation of cellular hierarchies.
View details for DOI 10.1126/science.aax0249
View details for PubMedID 31974247
Determining cell type abundance and expression from bulk tissues with digital cytometry.
Single-cell RNA-sequencing has emerged as a powerful technique for characterizing cellular heterogeneity, but it is currently impractical on large sample cohorts and cannot be applied to fixed specimens collected as part of routine clinical care. We previously developed an approach for digital cytometry, called CIBERSORT, that enables estimation of cell type abundances from bulk tissue transcriptomes. We now introduce CIBERSORTx, a machine learning method that extends this framework to infer cell-type-specific gene expression profiles without physical cell isolation. By minimizing platform-specific variation, CIBERSORTx also allows the use of single-cell RNA-sequencing data for large-scale tissue dissection. We evaluated the utility of CIBERSORTx in multiple tumor types, including melanoma, where single-cell reference profiles were used to dissect bulk clinical specimens, revealing cell-type-specific phenotypic states linked to distinct driver mutations and response to immune checkpoint blockade. We anticipate that digital cytometry will augment single-cell profiling efforts, enabling cost-effective, high-throughput tissue characterization without the need for antibodies, disaggregation or viable cells.
View details for PubMedID 31061481
Integrated digital error suppression for improved detection of circulating tumor DNA
View details for DOI 10.1038/nbt.3520
Robust enumeration of cell subsets from tissue expression profiles
View details for DOI 10.1038/nmeth.3337
The prognostic landscape of genes and infiltrating immune cells across human cancers
View details for DOI 10.1038/nm.3909
An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage
2014; 20 (5): 552-558
Circulating tumor DNA (ctDNA) is a promising biomarker for noninvasive assessment of cancer burden, but existing ctDNA detection methods have insufficient sensitivity or patient coverage for broad clinical applicability. Here we introduce cancer personalized profiling by deep sequencing (CAPP-Seq), an economical and ultrasensitive method for quantifying ctDNA. We implemented CAPP-Seq for non-small-cell lung cancer (NSCLC) with a design covering multiple classes of somatic alterations that identified mutations in >95% of tumors. We detected ctDNA in 100% of patients with stage II-IV NSCLC and in 50% of patients with stage I, with 96% specificity for mutant allele fractions down to ∼0.02%. Levels of ctDNA were highly correlated with tumor volume and distinguished between residual disease and treatment-related imaging changes, and measurement of ctDNA levels allowed for earlier response assessment than radiographic approaches. Finally, we evaluated biopsy-free tumor screening and genotyping with CAPP-Seq. We envision that CAPP-Seq could be routinely applied clinically to detect and monitor diverse malignancies, thus facilitating personalized cancer therapy.
View details for DOI 10.1038/nm.3519
View details for Web of Science ID 000335710700028
Identification of a colonial chordate histocompatibility gene
2013; 341 (6144): 384-387
View details for DOI 10.1126/science.1238036
Lab-Specific Gene Expression Signatures in Pluripotent Stem Cells
CELL STEM CELL
2010; 7 (2): 258-262
Pluripotent stem cells derived from both embryonic and reprogrammed somatic cells have significant potential for human regenerative medicine. Despite similarities in developmental potential, however, several groups have found fundamental differences between embryonic stem cell (ESC) and induced-pluripotent stem cell (iPSC) lines that may have important implications for iPSC-based medical therapies. Using an unsupervised clustering algorithm, we further studied the genetic homogeneity of iPSC and ESC lines by reanalyzing microarray gene expression data from seven different laboratories. Unexpectedly, this analysis revealed a strong correlation between gene expression signatures and specific laboratories in both ESC and iPSC lines. Nearly one-third of the genes with lab-specific expression signatures are also differentially expressed between ESCs and iPSCs. These data are consistent with the hypothesis that in vitro microenvironmental context differentially impacts the gene expression signatures of both iPSCs and ESCs.
View details for DOI 10.1016/j.stem.2010.06.016
View details for Web of Science ID 000281107400017
View details for PubMedID 20682451
LEFTY1 Is a Dual-SMAD Inhibitor that Promotes Mammary Progenitor Growth and Tumorigenesis.
Cell stem cell
SMAD pathways govern epithelial proliferation, and transforming growth factor beta (TGF-beta and BMP signaling through SMAD members has distinct effects on mammary development and homeostasis. Here, we show that LEFTY1, a secreted inhibitor of NODAL/SMAD2 signaling, is produced by mammary progenitor cells and, concomitantly, suppresses SMAD2 and SMAD5 signaling to promote long-term proliferation of normal and malignant mammary epithelial cells. In contrast, BMP7, a NODAL antagonist with context-dependent functions, is produced by basal cells and restrains progenitor cell proliferation. In normal mouse epithelium, LEFTY1 expression in a subset of luminal cells and rare basal cells opposes BMP7 to promote ductal branching. LEFTY1 binds BMPR2 to suppress BMP7-induced activation of SMAD5, and this LEFTY1-BMPR2 interaction is specific to tumor-initiating cells in triple-negative breast cancer xenografts that rely on LEFTY1 for growth. These results suggest that LEFTY1 is an endogenous dual-SMAD inhibitor and that suppressing its function may represent a therapeutic vulnerability in breast cancer.
View details for DOI 10.1016/j.stem.2020.06.017
View details for PubMedID 32693087
Noninvasive Early Identification of Therapeutic Benefit from Immune Checkpoint Inhibition.
Although treatment of non-small cell lung cancer (NSCLC) with immune checkpoint inhibitors (ICIs) can produce remarkably durable responses, most patients develop early disease progression. Furthermore, initial response assessment by conventional imaging is often unable to identify which patients will achieve durable clinical benefit (DCB). Here, we demonstrate that pre-treatment circulating tumor DNA (ctDNA) and peripheral CD8 T cell levels are independently associated with DCB. We further show that ctDNA dynamics after a single infusion can aid in identification of patients who will achieve DCB. Integrating these determinants, we developed and validated an entirely noninvasive multiparameter assay (DIREct-On, Durable Immunotherapy Response Estimation by immune profiling and ctDNA-On-treatment) that robustly predicts which patients will achieve DCB with higher accuracy than any individual feature. Taken together, these results demonstrate that integrated ctDNA and circulating immune cell profiling can provide accurate, noninvasive, and early forecasting of ultimate outcomes for NSCLC patients receiving ICIs.
View details for DOI 10.1016/j.cell.2020.09.001
View details for PubMedID 33007267
Profiling Cell Type Abundance and Expression in Bulk Tissues with CIBERSORTx.
Methods in molecular biology (Clifton, N.J.)
2020; 2117: 135–57
CIBERSORTx is a suite of machine learning tools for the assessment of cellular abundance and cell type-specific gene expression patterns from bulk tissue transcriptome profiles. With this framework, single-cell or bulk-sorted RNA sequencing data can be used to learn molecular signatures of distinct cell types from a small collection of biospecimens. These signatures can then be repeatedly applied to characterize cellular heterogeneity from bulk tissue transcriptomes without physical cell isolation. In this chapter, we provide a detailed primer on CIBERSORTx and demonstrate its capabilities for high-throughput profiling of cell types and cellular states in normal and neoplastic tissues.
View details for DOI 10.1007/978-1-0716-0301-7_7
View details for PubMedID 31960376
Atlas of clinically-distinct cell states and cellular ecosystems across human solid tumors
View details for Web of Science ID 000496473200425
- The Immune Landscape of Cancer. Immunity 2019; 51 (2): 411–12
Dynamic Risk Profiling Using Serial Tumor Biomarkers for Personalized Outcome Prediction.
Accurate prediction of long-term outcomes remains a challenge in the care of cancer patients. Due to the difficulty of serial tumor sampling, previous prediction tools have focused on pretreatment factors. However, emerging non-invasive diagnostics have increased opportunities for serial tumor assessments. We describe the Continuous Individualized Risk Index (CIRI), a method to dynamically determine outcome probabilities for individual patients utilizing risk predictors acquired over time. Similar to "win probability" models in other fields, CIRI provides a real-time probability by integrating risk assessments throughout a patient's course. Applying CIRI to patients with diffuse large B cell lymphoma, we demonstrate improved outcome prediction compared to conventional risk models. We demonstrate CIRI's broader utility in analogous models of chronic lymphocytic leukemia and breast adenocarcinoma and perform a proof-of-concept analysis demonstrating how CIRI could be used to develop predictive biomarkers for therapy selection. We envision thatdynamic risk assessment will facilitate personalized medicine and enable innovative therapeutic paradigms.
View details for DOI 10.1016/j.cell.2019.06.011
View details for PubMedID 31280963
- A tumor deconvolution DREAM Challenge: Inferring immune infiltration from bulk gene expression data AMER ASSOC CANCER RESEARCH. 2019
A functional subset of CD8+ T cells during chronic exhaustion is defined by SIRPalpha expression.
2019; 10 (1): 794
Prolonged exposure of CD8+ T cells to antigenic stimulation, as in chronic viral infections, leads to a state of diminished function termed exhaustion. We now demonstrate that even during exhaustion there is a subset of functional CD8+ T cells defined by surface expression of SIRPalpha, a protein not previously reported on lymphocytes. On SIRPalpha+ CD8+ T cells, expression of co-inhibitory receptors is counterbalanced by expression of co-stimulatory receptors and it is only SIRPalpha+ cells that actively proliferate, transcribe IFNgamma and show cytolytic activity. Furthermore, target cells that express the ligand for SIRPalpha, CD47, are more susceptible to CD8+ T cell-killing in vivo. SIRPalpha+ CD8+ T cells are evident in mice infected with Friend retrovirus, LCMV Clone 13, and in patients with chronic HCV infections. Furthermore, therapeutic blockade of PD-L1 to reinvigorate CD8+ T cells during chronic infection expands the cytotoxic subset of SIRPalpha+ CD8+ T cells.
View details for PubMedID 30770827
Reply to J. Wang et al.
Journal of clinical oncology : official journal of the American Society of Clinical Oncology
View details for PubMedID 30753108
- Spatial mapping of the immune microenvironment in primary triple-negative breast cancer (TNBC) and association with neoadjuvant therapy response AMER ASSOC CANCER RESEARCH. 2019
Computational approaches for characterizing the tumor immune microenvironment.
Recent advances in high-throughput molecular profiling technologies and multiplexed imaging platforms have revolutionized our ability to characterize the tumor immune microenvironment. As a result, studies of tumor-associated immune cells increasingly involve complex datasets that require sophisticated methods of computational analysis. In this review, we present an overview of key assays and related bioinformatics tools for analyzing the tumor-associated immune system in bulk tissues and at the single-cell level. In parallel, we describe how data science strategies and novel technologies have advanced tumor immunology and opened the door for new opportunities to exploit host immunity to improve cancer clinical outcomes. This article is protected by copyright. All rights reserved.
View details for DOI 10.1111/imm.13101
View details for PubMedID 31347163
Circulating tumor DNA analysis for detection of minimal residual disease after chemoradiotherapy for localized esophageal cancer.
Biomarkers are needed to identify patients at risk of tumor progression following chemoradiotherapy for localized esophageal cancer. These could improve identification of patients at risk for cancer progression and selection of therapy.We performed deep sequencing (CAPP-Seq) analyses of plasma cell-free DNA collected from 45 patients before and after chemoradiotherapy for esophageal cancer, as well as DNA from leukocytes, and fixed esophageal tumor biopsies collected during esophagogastroduodenoscopy. Patients were treated from May 2010 through October 2015; 23 patients subsequently underwent esophagectomy and 22 did not undergo surgery. We also sequenced DNA from blood samples from 40 healthy individuals (controls). We analyzed 802 regions of 607 genes for single-nucleotide variants previously associated with esophageal adenocarcinoma or squamous cell carcinoma. Patients underwent imaging analyses 6-8 weeks after chemoradiotherapy and were followed for 5 years. Our primary aim was to determine whether detection of circulating tumor DNA (ctDNA) following chemoradiotherapy is associated with risk of tumor progression (growth of local, regional, or distant tumors, detected by imaging or biopsy).The median proportion of tumor-derived DNA in total cell-free DNA before treatment was 0.07%, indicating that ultrasensitive assays are needed for quantification and analysis of ctDNA from localized esophageal tumors. Detection of ctDNA following chemoradiotherapy was associated with tumor progression (hazard ratio, 18.7; P<.0001), formation of distant metastases (hazard ratio, 32.1; P<.0001), and shorter disease-specific survival times (hazard ratio, 23.1; P<.0001). A higher proportion of patients with tumor progression had new mutations detected in plasma samples collected after chemoradiotherapy than patients without progression (P=.03). Detection of ctDNA after chemoradiotherapy preceded radiographic evidence of tumor progression by an average of 2.8 months. Among patients who received chemoradiotherapy without surgery, combined ctDNA and metabolic imaging analysis predicted progression in 100% of patients with tumor progression, compared with 71% for only ctDNA detection and 57% for only metabolic imaging analysis (P<.001 for comparison of either technique to combined analysis).In an analysis of cell-free DNA in blood samples from patients who underwent chemoradiotherapy for esophageal cancer, detection of ctDNA was associated with tumor progression, metastasis, and disease-specific survival. Analysis of ctDNA might be used to identify patients at highest risk for tumor progression.
View details for DOI 10.1053/j.gastro.2019.10.039
View details for PubMedID 31711920
Functional significance of U2AF1 S34F mutations in lung adenocarcinomas.
2019; 10 (1): 5712
The functional role of U2AF1 mutations in lung adenocarcinomas (LUADs) remains incompletely understood. Here, we report a significant co-occurrence of U2AF1 S34F mutations with ROS1 translocations in LUADs. To characterize this interaction, we profiled effects of S34F on the transcriptome-wide distribution of RNA binding and alternative splicing in cells harboring the ROS1 translocation. Compared to its wild-type counterpart, U2AF1 S34F preferentially binds and modulates splicing of introns containing CAG trinucleotides at their 3' splice junctions. The presence of S34F caused a shift in cross-linking at 3' splice sites, which was significantly associated with alternative splicing of skipped exons. U2AF1 S34F induced expression of genes involved in the epithelial-mesenchymal transition (EMT) and increased tumor cell invasion. Finally, S34F increased splicing of the long over the short SLC34A2-ROS1 isoform, which was also associated with enhanced invasiveness. Taken together, our results suggest a mechanistic interaction between mutant U2AF1 and ROS1 in LUAD.
View details for DOI 10.1038/s41467-019-13392-y
View details for PubMedID 31836708
- Method of Isolating and Transplanting the Hematopoietic Stem Cell with Its Microenvironment Which Improves Functional Hematopoietic Engraftment ELSEVIER SCIENCE INC. 2018: E224
Circulating tumor DNA (ctDNA) in B-cell lymphoma
WILEY. 2018: 16–17
View details for Web of Science ID 000444944200019
Circulating Tumor DNA Measurements As Early Outcome Predictors in Diffuse Large B-Cell Lymphoma.
Journal of clinical oncology : official journal of the American Society of Clinical Oncology
Purpose Outcomes for patients with diffuse large B-cell lymphoma remain heterogeneous, with existing methods failing to consistently predict treatment failure. We examined the additional prognostic value of circulating tumor DNA (ctDNA) before and during therapy for predicting patient outcomes. Patients and Methods We studied the dynamics of ctDNA from 217 patients treated at six centers, using a training and validation framework. We densely characterized early ctDNA dynamics during therapy using cancer personalized profiling by deep sequencing to define response-associated thresholds within a discovery set. These thresholds were assessed in two independent validation sets. Finally, we assessed the prognostic value of ctDNA in the context of established risk factors, including the International Prognostic Index and interim positron emission tomography/computed tomography scans. Results Before therapy, ctDNA was detectable in 98% of patients; pretreatment levels were prognostic in both front-line and salvage settings. In the discovery set, ctDNA levels changed rapidly, with a 2-log decrease after one cycle (early molecular response [EMR]) and a 2.5-log decrease after two cycles (major molecular response [MMR]) stratifying outcomes. In the first validation set, patients receiving front-line therapy achieving EMR or MMR had superior outcomes at 24 months (EMR: EFS, 83% v 50%; P = .0015; MMR: EFS, 82% v 46%; P < .001). EMR also predicted superior 24-month outcomes in patients receiving salvage therapy in the first validation set (EFS, 100% v 13%; P = .011). The prognostic value of EMR and MMR was further confirmed in the second validation set. In multivariable analyses including International Prognostic Index and interim positron emission tomography/computed tomography scans across both cohorts, molecular response was independently prognostic of outcomes, including event-free and overall survival. Conclusion Pretreatment ctDNA levels and molecular responses are independently prognostic of outcomes in aggressive lymphomas. These risk factors could potentially guide future personalized risk-directed approaches.
View details for PubMedID 30125215
Circulating Tumor DNA Quantitation for Early Response Assessment of Immune Checkpoint Inhibitors for Metastatic Non-Small Cell Lung Cancer
ELSEVIER SCIENCE INC. 2018: E1–E2
View details for Web of Science ID 000432447200003
Combination approach for detecting different types of alterations in circulating tumor DNA in leiomyosarcoma.
Clinical cancer research : an official journal of the American Association for Cancer Research
The clinical utility of circulating tumor DNA (ctDNA) monitoring has been shown in tumors that harbor highly recurrent mutations. Leiomyosarcoma (LMS) represents a type of tumor with a wide spectrum of heterogeneous genomic abnormalities; thus, targeting hotspot mutations or a narrow genomic region for ctDNA detection may not be practical. Here we demonstrate a combinatorial approach that integrates different sequencing protocols for the orthogonal detection of single nucleotide variants (SNVs), small indels and copy number alterations (CNAs) in ctDNA.We employed Cancer Personalized Profiling by deep Sequencing (CAPP-Seq) for the analysis of SNVs and indels, together with a genome-wide interrogation of CNAs by Genome Representation Profiling (GRP). We profiled 28 longitudinal plasma samples and 25 tumor specimens from 7 patients with LMS.We detected ctDNA in 6 of 7 of these patients with >98% specificity for mutant allele fractions down to a level of 0.01%. We show that results from CAPP-Seq and GRP are highly concordant, and the combination of these methods allows for more comprehensive monitoring of ctDNA by profiling a wide spectrum of tumor-specific markers. By analyzing multiple tumor specimens in individual patients obtained from different sites and at different times during treatment, we observed clonal evolution of these tumors that was reflected by ctDNA profiles.Our strategy allows for a comprehensive monitoring of a broad spectrum of tumor-specific markers in plasma. Our approach may be clinically useful not only in LMS but also in other tumor types that lack recurrent genomic alterations.
View details for PubMedID 29463554
Profiling Tumor Infiltrating Immune Cells with CIBERSORT.
Methods in molecular biology (Clifton, N.J.)
2018; 1711: 243–59
Tumor infiltrating leukocytes (TILs) are an integral component of the tumor microenvironment and have been found to correlate with prognosis and response to therapy. Methods to enumerate immune subsets such as immunohistochemistry or flow cytometry suffer from limitations in phenotypic markers and can be challenging to practically implement and standardize. An alternative approach is to acquire aggregative high dimensional data from cellular mixtures and to subsequently infer the cellular components computationally. We recently described CIBERSORT, a versatile computational method for quantifying cell fractions from bulk tissue gene expression profiles (GEPs). Combining support vector regression with prior knowledge of expression profiles from purified leukocyte subsets, CIBERSORT can accurately estimate the immune composition of a tumor biopsy. In this chapter, we provide a primer on the CIBERSORT method and illustrate its use for characterizing TILs in tumor samples profiled by microarray or RNA-Seq.
View details for PubMedID 29344893
Early B cell changes predict autoimmunity following combination immune checkpoint blockade.
The Journal of clinical investigation
Combination checkpoint blockade (CCB) targeting inhibitory CTLA4 and PD1 receptors holds promise for cancer therapy. Immune-related adverse events (IRAEs) remain a major obstacle for the optimal application of CCB in cancer. Here, we analyzed B cell changes in patients with melanoma following treatment with either anti-CTLA4 or anti-PD1, or in combination. CCB therapy led to changes in circulating B cells that were detectable after the first cycle of therapy and characterized by a decline in circulating B cells and an increase in CD21lo B cells and plasmablasts. PD1 expression was higher in the CD21lo B cells, and B cell receptor sequencing of these cells demonstrated greater clonality and a higher frequency of clones compared with CD21hi cells. CCB induced proliferation in the CD21lo compartment, and single-cell RNA sequencing identified B cell activation in cells with genomic profiles of CD21lo B cells in vivo. Increased clonality of circulating B cells following CCB occurred in some patients. Treatment-induced changes in B cells preceded and correlated with both the frequency and timing of IRAEs. Patients with early B cell changes experienced higher rates of grade 3 or higher IRAEs 6 months after CCB. Thus, early changes in B cells following CCB may identify patients who are at increased risk of IRAEs, and preemptive strategies targeting B cells may reduce toxicities in these patients.
View details for PubMedID 29309048
Complex mammalian-like haematopoietic system found in a colonial chordate.
Haematopoiesis is an essential process that evolved in multicellular animals. At the heart of this process are haematopoietic stem cells (HSCs), which are multipotent and self-renewing, and generate the entire repertoire of blood and immune cells throughout an animal's life1. Although there have been comprehensive studies on self-renewal, differentiation, physiological regulation and niche occupation in vertebrate HSCs, relatively little is known about the evolutionary origin and niches of these cells. Here we describe the haematopoietic system of Botryllus schlosseri, a colonial tunicate that has a vasculature and circulating blood cells, and interesting stem-cell biology and immunity characteristics2-8. Self-recognition between genetically compatible B. schlosseri colonies leads to the formation of natural parabionts with shared circulation, whereas incompatible colonies reject each other3,4,7. Using flow cytometry, whole-transcriptome sequencing of defined cell populations and diverse functional assays, we identify HSCs, progenitors, immune effector cells and an HSC niche, and demonstrate that self-recognition inhibits allospecific cytotoxic reactions. Our results show that HSC and myeloid lineage immune cells emerged in a common ancestor of tunicates and vertebrates, and also suggest that haematopoietic bone marrow and the B. schlosseri endostyle niche evolved from a common origin.
View details for PubMedID 30518860
Genomic Feature Selection by Coverage Design Optimization.
Journal of applied statistics
2018; 45 (14): 2658–76
We introduce a novel data reduction technique whereby we select a subset of tiles to "cover" maximally events of interest in large-scale biological datasets (e.g., genetic mutations), while minimizing the number of tiles. A tile is a genomic unit capturing one or more biological events, such as a sequence of base pairs that can be sequenced and observed simultaneously. The goal is to reduce significantly the number of tiles considered to those with areas of dense events in a cohort, thus saving on cost and enhancing interpretability. However, the reduction should not come at the cost of too much information, allowing for sensible statistical analysis after its application. We envisage application of our methods to a variety of high throughput data types, particularly those produced by next generation sequencing (NGS) experiments. The procedure is cast as a convex optimization problem, which is presented, along with methods of its solution. The method is demonstrated on a large dataset of somatic mutations spanning 5000+ patients, each having one of 29 cancer types. Applied to these data, our method dramatically reduces the number of gene locations required for broad coverage of patients and their mutations, giving subject specialists a more easily interpretable snapshot of recurrent mutational profiles in these cancers. The locations identified coincide with previously identified cancer genes. Finally, despite considerable data reduction, we show that our covering designs preserve the cancer discrimination ability of multinomial logistic regression models trained on all of the locations (> 1M).
View details for PubMedID 30294060
Circulating tumor DNA levels correlate with response to treatment in LMS patients
AMER ASSOC CANCER RESEARCH. 2018: 38–39
View details for Web of Science ID 000422882000043
Macrophage infiltration and genetic landscape of undifferentiated uterine sarcomas.
2017; 2 (11)
Endometrial stromal tumors include translocation-associated low- and high-grade endometrial stromal sarcomas (ESS) and highly malignant undifferentiated uterine sarcomas (UUS). UUS is considered a poorly defined group of aggressive tumors and is often seen as a diagnosis of exclusion after ESS and leiomyosarcoma (LMS) have been ruled out. We performed a comprehensive analysis of gene expression, copy number variation, point mutations, and immune cell infiltrates in the largest series to date of all major types of uterine sarcomas to shed light on the biology of UUS and to identify potential novel therapeutic targets. We show that UUS tumors have a distinct molecular profile from LMS and ESS. Gene expression and immunohistochemical analyses revealed the presence of high numbers of tumor-associated macrophages (TAMs) in UUS, which makes UUS patients suitable candidates for therapies targeting TAMs. Our results show a high genomic instability of UUS and downregulation of several TP53-mediated tumor suppressor genes, such as NDN, CDH11, and NDRG4. Moreover, we demonstrate that UUS carry somatic mutations in several oncogenes and tumor suppressor genes implicated in RAS/PI3K/AKT/mTOR, ERBB3, and Hedgehog signaling.
View details for DOI 10.1172/jci.insight.94033
View details for PubMedID 28570276
Antigen presentation profiling reveals recognition of lymphoma immunoglobulin neoantigens
2017; 543 (7647): 723-?
Cancer somatic mutations can generate neoantigens that distinguish malignant from normal cells. However, the personalized identification and validation of neoantigens remains a major challenge. Here we discover neoantigens in human mantle-cell lymphomas by using an integrated genomic and proteomic strategy that interrogates tumour antigen peptides presented by major histocompatibility complex (MHC) class I and class II molecules. We applied this approach to systematically characterize MHC ligands from 17 patients. Remarkably, all discovered neoantigenic peptides were exclusively derived from the lymphoma immunoglobulin heavy- or light-chain variable regions. Although we identified MHC presentation of private polymorphic germline alleles, no mutated peptides were recovered from non-immunoglobulin somatically mutated genes. Somatic mutations within the immunoglobulin variable region were almost exclusively presented by MHC class II. We isolated circulating CD4(+) T cells specific for immunoglobulin-derived neoantigens and found these cells could mediate killing of autologous lymphoma cells. These results demonstrate that an integrative approach combining MHC isolation, peptide identification, and exome sequencing is an effective platform to uncover tumour neoantigens. Application of this strategy to human lymphoma implicates immunoglobulin neoantigens as targets for lymphoma immunotherapy.
View details for DOI 10.1038/nature21433
View details for PubMedID 28329770
Data normalization considerations for digital tumor dissection.
2017; 18 (1): 128
In a recently published article in Genome Biology, Li and colleagues introduced TIMER, a gene expression deconvolution approach for studying tumor-infiltrating leukocytes (TILs) in 23 cancer types profiled by The Cancer Genome Atlas. Methods to characterize TIL biology are increasingly important, and the authors offer several arguments in favor of their strategy. Several of these claims warrant further discussion and highlight the critical importance of data normalization in gene expression deconvolution applications.Please see related Li et al correspondence: www.dx.doi.org/10.1186/s13059-017-1256-5 and Zheng correspondence: www.dx.doi.org/10.1186/s13059-017-1258-3.
View details for PubMedID 28679399
Targeted chromatin ligation, a robust epigenetic profiling technique for small cell numbers.
Nucleic acids research
2017; 45 (17): e153
The complexity and inefficiency of chromatin immunoprecipitation strategies restrict their sensitivity and application when examining rare cell populations. We developed a new technique that replaces immunoprecipitation with a simplified chromatin fragmentation and proximity ligation step that eliminates bead purification and washing steps. We present a simple single tube proximity ligation technique, targeted chromatin ligation, that captures histone modification patterns with only 200 cells. Our technique eliminates loss of material and sensitivity due to multiple inefficient steps, while simplifying the workflow to enhance sensitivity and create the potential for novel applications.
View details for PubMedID 28973448
Early detection of molecular residual disease in localized lung cancer by circulating tumor DNA profiling.
Identifying molecular residual disease (MRD) after treatment of localized lung cancer could facilitate early intervention and personalization of adjuvant therapies. Here we apply Cancer Personalized Profiling by Deep Sequencing (CAPP-Seq) circulating tumor DNA (ctDNA) analysis to 255 samples from 40 patients treated with curative intent for stage I-III lung cancer and 54 healthy adults. In 94% of evaluable patients experiencing recurrence, ctDNA was detectable in the first post-treatment blood sample, indicating reliable identification of MRD. Post-treatment ctDNA detection preceded radiographic progression in 72% of patients by a median of 5.2 months and 53% of patients harbored ctDNA mutation profiles associated with favorable responses to tyrosine kinase inhibitors or immune checkpoint blockade. Collectively, these results indicate that ctDNA MRD in lung cancer patients can be accurately detected using CAPP-Seq and may allow personalized adjuvant treatment while disease burden is lowest.
View details for PubMedID 28899864
Distinct biological subtypes and patterns of genome evolution in lymphoma revealed by circulating tumor DNA
SCIENCE TRANSLATIONAL MEDICINE
2016; 8 (364)
Patients with diffuse large B cell lymphoma (DLBCL) exhibit marked diversity in tumor behavior and outcomes, yet the identification of poor-risk groups remains challenging. In addition, the biology underlying these differences is incompletely understood. We hypothesized that characterization of mutational heterogeneity and genomic evolution using circulating tumor DNA (ctDNA) profiling could reveal molecular determinants of adverse outcomes. To address this hypothesis, we applied cancer personalized profiling by deep sequencing (CAPP-Seq) analysis to tumor biopsies and cell-free DNA samples from 92 lymphoma patients and 24 healthy subjects. At diagnosis, the amount of ctDNA was found to strongly correlate with clinical indices and was independently predictive of patient outcomes. We demonstrate that ctDNA genotyping can classify transcriptionally defined tumor subtypes, including DLBCL cell of origin, directly from plasma. By simultaneously tracking multiple somatic mutations in ctDNA, our approach outperformed immunoglobulin sequencing and radiographic imaging for the detection of minimal residual disease and facilitated noninvasive identification of emergent resistance mutations to targeted therapies. In addition, we identified distinct patterns of clonal evolution distinguishing indolent follicular lymphomas from those that transformed into DLBCL, allowing for potential noninvasive prediction of histological transformation. Collectively, our results demonstrate that ctDNA analysis reveals biological factors that underlie lymphoma clinical outcomes and could facilitate individualized therapy.
View details for DOI 10.1126/scitranslmed.aai8545
View details for PubMedID 27831904
Role of KEAP1/NRF2 and TP53 Mutations in Lung Squamous Cell Carcinoma Development and Radiation Resistance.
Lung squamous cell carcinoma (LSCC) pathogenesis remains incompletely understood, and biomarkers predicting treatment response remain lacking. Here, we describe novel murine LSCC models driven by loss of Trp53 and Keap1, both of which are frequently mutated in human LSCCs. Homozygous inactivation of Keap1 or Trp53 promoted airway basal stem cell (ABSC) self-renewal, suggesting that mutations in these genes lead to expansion of mutant stem cell clones. Deletion of Trp53 and Keap1 in ABSCs, but not more differentiated tracheal cells, produced tumors recapitulating histologic and molecular features of human LSCCs, indicating that they represent the likely cell of origin in this model. Deletion of Keap1 promoted tumor aggressiveness, metastasis, and resistance to oxidative stress and radiotherapy (RT). KEAP1/NRF2 mutation status predicted risk of local recurrence after RT in patients with non-small lung cancer (NSCLC) and could be noninvasively identified in circulating tumor DNA. Thus, KEAP1/NRF2 mutations could serve as predictive biomarkers for personalization of therapeutic strategies for NSCLCs.We developed an LSCC mouse model involving Trp53 and Keap1, which are frequently mutated in human LSCCs. In this model, ABSCs are the cell of origin of these tumors. KEAP1/NRF2 mutations increase radioresistance and predict local tumor recurrence in radiotherapy patients. Our findings are of potential clinical relevance and could lead to personalized treatment strategies for tumors with KEAP1/NRF2 mutations. Cancer Discov; 7(1); 86-101. ©2016 AACR.This article is highlighted in the In This Issue feature, p. 1.
View details for PubMedID 27663899
High-throughput genomic profiling of tumor-infiltrating leukocytes.
Current opinion in immunology
2016; 41: 77-84
Tumors are complex ecosystems comprised of diverse cell types including malignant cells, mesenchymal cells, and tumor-infiltrating leukocytes (TILs). While TILs are well known to play important roles in many aspects of cancer biology, recent developments in immuno-oncology have spurred considerable interest in TILs, particularly in relation to their optimal engagement by emerging immunotherapies. Traditionally, the enumeration of TIL phenotypic diversity and composition in solid tumors has relied on resolving single cells by flow cytometry and immunohistochemical methods. However, advances in genome-wide technologies and computational methods are now allowing TILs to be profiled with increasingly high resolution and accuracy directly from RNA mixtures of bulk tumor samples. In this review, we highlight recent progress in the development of in silico tumor dissection methods, and illustrate examples of how these strategies can be applied to characterize TILs in human tumors to facilitate personalized cancer therapy.
View details for DOI 10.1016/j.coi.2016.06.006
View details for PubMedID 27372732
Circulating tumour DNA profiling reveals heterogeneity of EGFR inhibitor resistance mechanisms in lung cancer patients
Circulating tumour DNA (ctDNA) analysis facilitates studies of tumour heterogeneity. Here we employ CAPP-Seq ctDNA analysis to study resistance mechanisms in 43 non-small cell lung cancer (NSCLC) patients treated with the third-generation epidermal growth factor receptor (EGFR) inhibitor rociletinib. We observe multiple resistance mechanisms in 46% of patients after treatment with first-line inhibitors, indicating frequent intra-patient heterogeneity. Rociletinib resistance recurrently involves MET, EGFR, PIK3CA, ERRB2, KRAS and RB1. We describe a novel EGFR L798I mutation and find that EGFR C797S, which arises in ∼33% of patients after osimertinib treatment, occurs in <3% after rociletinib. Increased MET copy number is the most frequent rociletinib resistance mechanism in this cohort and patients with multiple pre-existing mechanisms (T790M and MET) experience inferior responses. Similarly, rociletinib-resistant xenografts develop MET amplification that can be overcome with the MET inhibitor crizotinib. These results underscore the importance of tumour heterogeneity in NSCLC and the utility of ctDNA-based resistance mechanism assessment.
View details for DOI 10.1038/ncomms11815
View details for PubMedID 27283993
Identification of tumorigenic cells and therapeutic targets in pancreatic neuroendocrine tumors
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2016; 113 (16): 4464-4469
Pancreatic neuroendocrine tumors (PanNETs) are a type of pancreatic cancer with limited therapeutic options. Consequently, most patients with advanced disease die from tumor progression. Current evidence indicates that a subset of cancer cells is responsible for tumor development, metastasis, and recurrence, and targeting these tumor-initiating cells is necessary to eradicate tumors. However, tumor-initiating cells and the biological processes that promote pathogenesis remain largely uncharacterized in PanNETs. Here we profile primary and metastatic tumors from an index patient and demonstrate that MET proto-oncogene activation is important for tumor growth in PanNET xenograft models. We identify a highly tumorigenic cell population within several independent surgically acquired PanNETs characterized by increased cell-surface protein CD90 expression and aldehyde dehydrogenase A1 (ALDHA1) activity, and provide in vitro and in vivo evidence for their stem-like properties. We performed proteomic profiling of 332 antigens in two cell lines and four primary tumors, and showed that CD47, a cell-surface protein that acts as a "don't eat me" signal co-opted by cancers to evade innate immune surveillance, is ubiquitously expressed. Moreover, CD47 coexpresses with MET and is enriched in CD90(hi)cells. Furthermore, blocking CD47 signaling promotes engulfment of tumor cells by macrophages in vitro and inhibits xenograft tumor growth, prevents metastases, and prolongs survival in vivo.
View details for DOI 10.1073/pnas.1600007113
View details for PubMedID 27035983
Skin fibrosis. Identification and isolation of a dermal lineage with intrinsic fibrogenic potential.
2015; 348 (6232)
Dermal fibroblasts represent a heterogeneous population of cells with diverse features that remain largely undefined. We reveal the presence of at least two fibroblast lineages in murine dorsal skin. Lineage tracing and transplantation assays demonstrate that a single fibroblast lineage is responsible for the bulk of connective tissue deposition during embryonic development, cutaneous wound healing, radiation fibrosis, and cancer stroma formation. Lineage-specific cell ablation leads to diminished connective tissue deposition in wounds and reduces melanoma growth. Using flow cytometry, we identify CD26/DPP4 as a surface marker that allows isolation of this lineage. Small molecule-based inhibition of CD26/DPP4 enzymatic activity during wound healing results in diminished cutaneous scarring. Identification and isolation of these lineages hold promise for translational medicine aimed at in vivo modulation of fibrogenic behavior.
View details for DOI 10.1126/science.aaa2151
View details for PubMedID 25883361
View details for PubMedCentralID PMC5088503
Potential clinical utility of ultrasensitive circulating tumor DNA detection with CAPP-Seq.
Expert review of molecular diagnostics
Tumors continually shed DNA into the circulation, where it can be noninvasively accessed. The ability to accurately detect circulating tumor DNA (ctDNA) could significantly impact the management of patients with nearly every cancer type. Quantitation of ctDNA could allow objective response assessment, detection of minimal residual disease and noninvasive tumor genotyping. The latter application overcomes the barriers currently limiting repeated tumor tissue sampling during therapy. Recent technical advancements have improved upon the sensitivity, specificity and feasibility of ctDNA detection and promise to enable innovative clinical applications. Here, we focus on the potential clinical utility of ctDNA analysis using CAncer Personalized Profiling by deep Sequencing (CAPP-Seq), a novel next-generation sequencing-based approach for ultrasensitive ctDNA detection. Applications of CAPP-Seq for the personalization of cancer detection and therapy are discussed.
View details for DOI 10.1586/14737159.2015.1019476
View details for PubMedID 25773944
Large-Scale and Comprehensive Immune Profiling and Functional Analysis of Normal Human Aging.
2015; 10 (7)
While many age-associated immune changes have been reported, a comprehensive set of metrics of immune aging is lacking. Here we report data from 243 healthy adults aged 40-97, for whom we measured clinical and functional parameters, serum cytokines, cytokines and gene expression in stimulated and unstimulated PBMC, PBMC phenotypes, and cytokine-stimulated pSTAT signaling in whole blood. Although highly heterogeneous across individuals, many of these assays revealed trends by age, sex, and CMV status, to greater or lesser degrees. Age, then sex and CMV status, showed the greatest impact on the immune system, as measured by the percentage of assay readouts with significant differences. An elastic net regression model could optimally predict age with 14 analytes from different assays. This reinforces the importance of multivariate analysis for defining a healthy immune system. These data provide a reference for others measuring immune parameters in older people.
View details for DOI 10.1371/journal.pone.0133627
View details for PubMedID 26197454
In Vivo clonal analysis reveals lineage-restricted progenitor characteristics in Mammalian kidney development, maintenance, and regeneration.
2014; 7 (4): 1270-1283
The mechanism and magnitude by which the mammalian kidney generates and maintains its proximal tubules, distal tubules, and collecting ducts remain controversial. Here, we use long-term in vivo genetic lineage tracing and clonal analysis of individual cells from kidneys undergoing development, maintenance, and regeneration. We show that the adult mammalian kidney undergoes continuous tubulogenesis via expansions of fate-restricted clones. Kidneys recovering from damage undergo tubulogenesis through expansions of clones with segment-specific borders, and renal spheres developing in vitro from individual cells maintain distinct, segment-specific fates. Analysis of mice derived by transfer of color-marked embryonic stem cells (ESCs) into uncolored blastocysts demonstrates that nephrons are polyclonal, developing from expansions of singly fated clones. Finally, we show that adult renal clones are derived from Wnt-responsive precursors, and their tracing in vivo generates tubules that are segment specific. Collectively, these analyses demonstrate that fate-restricted precursors functioning as unipotent progenitors continuously maintain and self-preserve the mouse kidney throughout life.
View details for DOI 10.1016/j.celrep.2014.04.018
View details for PubMedID 24835991
View details for PubMedCentralID PMC4425291
Efficient Selection of Biomineralizing DNA Aptamers Using Deep Sequencing and Population Clustering
2014; 8 (1): 387-395
View details for DOI 10.1021/nn404448s
Identifying Stem Cell Gene Expression Patterns and Phenotypic Networks with AutoSOME.
Methods in molecular biology (Clifton, N.J.)
2014; 1150: 115-130
Stem cells have the unique property of differentiation and self-renewal and play critical roles in normal development, tissue repair, and disease. To promote systems-wide analysis of cells and tissues, we developed AutoSOME, a machine-learning method for identifying coordinated gene expression patterns and correlated cellular phenotypes in whole-transcriptome data, without prior knowledge of cluster number or structure. Here, we present a facile primer demonstrating the use of AutoSOME for identification and characterization of stem cell gene expression signatures and for visualization of transcriptome networks using Cytoscape. This protocol should serve as a general foundation for gene expression cluster analysis of stem cells, with applications for studying pluripotency, multi-lineage potential, and neoplastic disease.
View details for DOI 10.1007/978-1-4939-0512-6_6
View details for PubMedID 24743993
FACTERA: a practical method for the discovery of genomic rearrangements at breakpoint resolution
View details for DOI 10.1093/bioinformatics/btu549
The genome sequence of the colonial chordate, Botryllus schlosseri.
Botryllus schlosseri is a colonial urochordate that follows the chordate plan of development following sexual reproduction, but invokes a stem cell-mediated budding program during subsequent rounds of asexual reproduction. As urochordates are considered to be the closest living invertebrate relatives of vertebrates, they are ideal subjects for whole genome sequence analyses. Using a novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which could be confidently parceled into 13 (of 16 haploid) chromosomes. A comparison of homologous genes between B. schlosseri and other diverse taxonomic groups revealed genomic events underlying the evolution of vertebrates and lymphoid-mediated immunity. The B. schlosseri genome is a community resource for studying alternative modes of reproduction, natural transplantation reactions, and stem cell-mediated regeneration. DOI:http://dx.doi.org/10.7554/eLife.00569.001.
View details for DOI 10.7554/eLife.00569
View details for PubMedID 23840927
Systems-level analysis of age-related macular degeneration reveals global biomarkers and phenotype-specific functional networks
Please see related commentary: http://www.biomedcentral.com/1741-7015/10/21/abstractAge-related macular degeneration (AMD) is a leading cause of blindness that affects the central region of the retinal pigmented epithelium (RPE), choroid, and neural retina. Initially characterized by an accumulation of sub-RPE deposits, AMD leads to progressive retinal degeneration, and in advanced cases, irreversible vision loss. Although genetic analysis, animal models, and cell culture systems have yielded important insights into AMD, the molecular pathways underlying AMD's onset and progression remain poorly delineated. We sought to better understand the molecular underpinnings of this devastating disease by performing the first comparative transcriptome analysis of AMD and normal human donor eyes.RPE-choroid and retina tissue samples were obtained from a common cohort of 31 normal, 26 AMD, and 11 potential pre-AMD human donor eyes. Transcriptome profiles were generated for macular and extramacular regions, and statistical and bioinformatic methods were employed to identify disease-associated gene signatures and functionally enriched protein association networks. Selected genes of high significance were validated using an independent donor cohort.We identified over 50 annotated genes enriched in cell-mediated immune responses that are globally over-expressed in RPE-choroid AMD phenotypes. Using a machine learning model and a second donor cohort, we show that the top 20 global genes are predictive of AMD clinical diagnosis. We also discovered functionally enriched gene sets in the RPE-choroid that delineate the advanced AMD phenotypes, neovascular AMD and geographic atrophy. Moreover, we identified a graded increase of transcript levels in the retina related to wound response, complement cascade, and neurogenesis that strongly correlates with decreased levels of phototransduction transcripts and increased AMD severity. Based on our findings, we assembled protein-protein interactomes that highlight functional networks likely to be involved in AMD pathogenesis.We discovered new global biomarkers and gene expression signatures of AMD. These results are consistent with a model whereby cell-based inflammatory responses represent a central feature of AMD etiology, and depending on genetics, environment, or stochastic factors, may give rise to the advanced AMD phenotypes characterized by angiogenesis and/or cell death. Genes regulating these immunological activities, along with numerous other genes identified here, represent promising new targets for AMD-directed therapeutics and diagnostics.
View details for DOI 10.1186/PREACCEPT-1418491035586234
View details for Web of Science ID 000314566500002
View details for PubMedID 22364233
A proteomic approach for the identification of novel lysine methyltransferase substrates
EPIGENETICS & CHROMATIN
Signaling via protein lysine methylation has been proposed to play a central role in the regulation of many physiologic and pathologic programs. In contrast to other post-translational modifications such as phosphorylation, proteome-wide approaches to investigate lysine methylation networks do not exist.In the current study, we used the ProtoArray® platform, containing over 9,500 human proteins, and developed and optimized a system for proteome-wide identification of novel methylation events catalyzed by the protein lysine methyltransferase (PKMT) SETD6. This enzyme had previously been shown to methylate the transcription factor RelA, but it was not known whether SETD6 had other substrates. By using two independent detection approaches, we identified novel candidate substrates for SETD6, and verified that all targets tested in vitro and in cells were genuine substrates.We describe a novel proteome-wide methodology for the identification of new PKMT substrates. This technological advance may lead to a better understanding of the enzymatic activity and substrate specificity of the large number (more than 50) PKMTs present in the human proteome, most of which are uncharacterized.
View details for DOI 10.1186/1756-8935-4-19
View details for PubMedID 22024134
Global Analysis of Proline-Rich Tandem Repeat Proteins Reveals Broad Phylogenetic Diversity in Plant Secretomes
2011; 6 (8)
Cell walls, constructed by precisely choreographed changes in the plant secretome, play critical roles in plant cell physiology and development. Along with structural polysaccharides, secreted proline-rich Tandem Repeat Proteins (TRPs) are important for cell wall function, yet the evolutionary diversity of these structural TRPs remains virtually unexplored. Using a systems-level computational approach to analyze taxonomically diverse plant sequence data, we identified 31 distinct Pro-rich TRP classes targeted for secretion. This analysis expands upon the known phylogenetic diversity of extensins, the most widely studied class of wall structural proteins, and demonstrates that extensins evolved before plant vascularization. Our results also show that most Pro-rich TRP classes have unexpectedly restricted evolutionary distributions, revealing considerable differences in plant secretome signatures that define unexplored diversity.
View details for DOI 10.1371/journal.pone.0023167
View details for Web of Science ID 000293511900032
View details for PubMedID 21829715
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
2011; 12: 436
View details for DOI 10.1186/1471-2105-12-436
AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number
Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry.We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four.By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome webcite.
View details for DOI 10.1186/1471-2105-11-117
View details for Web of Science ID 000276296100002
View details for PubMedID 20202218
XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences
Biological sequence repeats arranged in tandem patterns are widespread in DNA and proteins. While many software tools have been designed to detect DNA tandem repeats (TRs), useful algorithms for identifying protein TRs with varied levels of degeneracy are still needed.To address limitations of current repeat identification methods, and to provide an efficient and flexible algorithm for the detection and analysis of TRs in protein sequences, we designed and implemented a new computational method called XSTREAM. Running time tests confirm the practicality of XSTREAM for analyses of multi-genome datasets. Each of the key capabilities of XSTREAM (e.g., merging, nesting, long-period detection, and TR architecture modeling) are demonstrated using anecdotal examples, and the utility of XSTREAM for identifying TR proteins was validated using data from a recently published paper.We show that XSTREAM is a practical and valuable tool for TR detection in protein and nucleotide sequences at the multi-genome scale, and an effective tool for modeling TR domains with diverse architectures and varied levels of degeneracy. Because of these useful features, XSTREAM has significant potential for the discovery of naturally-evolved modular proteins with applications for engineering novel biostructural and biomimetic materials, and identifying new vaccine and diagnostic targets.
View details for DOI 10.1186/1471-2105-8-382
View details for Web of Science ID 000252936900001
View details for PubMedID 17931424