Assistant Professor, Biomedical Data Science
Honors & Awards
K99/R00 Pathway to Independence Award, NIH/NCI (2015 - Pres)
Visionary Postdoctoral Fellowship, Dept. of Defense (2012 - 2015)
NIH T32 Cancer Biology Training Grant, Stanford University (2012)
PhD, University of California, Santa Barbara, Biomolecular Science and Engineering Program (2010)
Integrated digital error suppression for improved detection of circulating tumor DNA
View details for DOI 10.1038/nbt.3520
Robust enumeration of cell subsets from tissue expression profiles
View details for DOI 10.1038/nmeth.3337
The prognostic landscape of genes and infiltrating immune cells across human cancers
View details for DOI 10.1038/nm.3909
- An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage NATURE MEDICINE 2014; 20 (5): 552-558
FACTERA: a practical method for the discovery of genomic rearrangements at breakpoint resolution
View details for DOI 10.1093/bioinformatics/btu549
Identification of a colonial chordate histocompatibility gene
2013; 341 (6144): 384-387
View details for DOI 10.1126/science.1238036
Lab-Specific Gene Expression Signatures in Pluripotent Stem Cells
CELL STEM CELL
2010; 7 (2): 258-262
Pluripotent stem cells derived from both embryonic and reprogrammed somatic cells have significant potential for human regenerative medicine. Despite similarities in developmental potential, however, several groups have found fundamental differences between embryonic stem cell (ESC) and induced-pluripotent stem cell (iPSC) lines that may have important implications for iPSC-based medical therapies. Using an unsupervised clustering algorithm, we further studied the genetic homogeneity of iPSC and ESC lines by reanalyzing microarray gene expression data from seven different laboratories. Unexpectedly, this analysis revealed a strong correlation between gene expression signatures and specific laboratories in both ESC and iPSC lines. Nearly one-third of the genes with lab-specific expression signatures are also differentially expressed between ESCs and iPSCs. These data are consistent with the hypothesis that in vitro microenvironmental context differentially impacts the gene expression signatures of both iPSCs and ESCs.
View details for DOI 10.1016/j.stem.2010.06.016
View details for Web of Science ID 000281107400017
View details for PubMedID 20682451
Macrophage infiltration and genetic landscape of undifferentiated uterine sarcomas.
2017; 2 (11)
Endometrial stromal tumors include translocation-associated low- and high-grade endometrial stromal sarcomas (ESS) and highly malignant undifferentiated uterine sarcomas (UUS). UUS is considered a poorly defined group of aggressive tumors and is often seen as a diagnosis of exclusion after ESS and leiomyosarcoma (LMS) have been ruled out. We performed a comprehensive analysis of gene expression, copy number variation, point mutations, and immune cell infiltrates in the largest series to date of all major types of uterine sarcomas to shed light on the biology of UUS and to identify potential novel therapeutic targets. We show that UUS tumors have a distinct molecular profile from LMS and ESS. Gene expression and immunohistochemical analyses revealed the presence of high numbers of tumor-associated macrophages (TAMs) in UUS, which makes UUS patients suitable candidates for therapies targeting TAMs. Our results show a high genomic instability of UUS and downregulation of several TP53-mediated tumor suppressor genes, such as NDN, CDH11, and NDRG4. Moreover, we demonstrate that UUS carry somatic mutations in several oncogenes and tumor suppressor genes implicated in RAS/PI3K/AKT/mTOR, ERBB3, and Hedgehog signaling.
View details for DOI 10.1172/jci.insight.94033
View details for PubMedID 28570276
Antigen presentation profiling reveals recognition of lymphoma immunoglobulin neoantigens
2017; 543 (7647): 723-?
Cancer somatic mutations can generate neoantigens that distinguish malignant from normal cells. However, the personalized identification and validation of neoantigens remains a major challenge. Here we discover neoantigens in human mantle-cell lymphomas by using an integrated genomic and proteomic strategy that interrogates tumour antigen peptides presented by major histocompatibility complex (MHC) class I and class II molecules. We applied this approach to systematically characterize MHC ligands from 17 patients. Remarkably, all discovered neoantigenic peptides were exclusively derived from the lymphoma immunoglobulin heavy- or light-chain variable regions. Although we identified MHC presentation of private polymorphic germline alleles, no mutated peptides were recovered from non-immunoglobulin somatically mutated genes. Somatic mutations within the immunoglobulin variable region were almost exclusively presented by MHC class II. We isolated circulating CD4(+) T cells specific for immunoglobulin-derived neoantigens and found these cells could mediate killing of autologous lymphoma cells. These results demonstrate that an integrative approach combining MHC isolation, peptide identification, and exome sequencing is an effective platform to uncover tumour neoantigens. Application of this strategy to human lymphoma implicates immunoglobulin neoantigens as targets for lymphoma immunotherapy.
View details for DOI 10.1038/nature21433
View details for Web of Science ID 000397619700057
View details for PubMedID 28329770
Data normalization considerations for digital tumor dissection.
2017; 18 (1): 128
In a recently published article in Genome Biology, Li and colleagues introduced TIMER, a gene expression deconvolution approach for studying tumor-infiltrating leukocytes (TILs) in 23 cancer types profiled by The Cancer Genome Atlas. Methods to characterize TIL biology are increasingly important, and the authors offer several arguments in favor of their strategy. Several of these claims warrant further discussion and highlight the critical importance of data normalization in gene expression deconvolution applications.Please see related Li et al correspondence: www.dx.doi.org/10.1186/s13059-017-1256-5 and Zheng correspondence: www.dx.doi.org/10.1186/s13059-017-1258-3.
View details for DOI 10.1186/s13059-017-1257-4
View details for PubMedID 28679399
Distinct biological subtypes and patterns of genome evolution in lymphoma revealed by circulating tumor DNA
SCIENCE TRANSLATIONAL MEDICINE
2016; 8 (364)
Patients with diffuse large B cell lymphoma (DLBCL) exhibit marked diversity in tumor behavior and outcomes, yet the identification of poor-risk groups remains challenging. In addition, the biology underlying these differences is incompletely understood. We hypothesized that characterization of mutational heterogeneity and genomic evolution using circulating tumor DNA (ctDNA) profiling could reveal molecular determinants of adverse outcomes. To address this hypothesis, we applied cancer personalized profiling by deep sequencing (CAPP-Seq) analysis to tumor biopsies and cell-free DNA samples from 92 lymphoma patients and 24 healthy subjects. At diagnosis, the amount of ctDNA was found to strongly correlate with clinical indices and was independently predictive of patient outcomes. We demonstrate that ctDNA genotyping can classify transcriptionally defined tumor subtypes, including DLBCL cell of origin, directly from plasma. By simultaneously tracking multiple somatic mutations in ctDNA, our approach outperformed immunoglobulin sequencing and radiographic imaging for the detection of minimal residual disease and facilitated noninvasive identification of emergent resistance mutations to targeted therapies. In addition, we identified distinct patterns of clonal evolution distinguishing indolent follicular lymphomas from those that transformed into DLBCL, allowing for potential noninvasive prediction of histological transformation. Collectively, our results demonstrate that ctDNA analysis reveals biological factors that underlie lymphoma clinical outcomes and could facilitate individualized therapy.
View details for DOI 10.1126/scitranslmed.aai8545
View details for Web of Science ID 000389448100006
View details for PubMedID 27831904
Role of KEAP1/NRF2 and TP53 Mutations in Lung Squamous Cell Carcinoma Development and Radiation Resistance.
Lung squamous cell carcinoma (LSCC) pathogenesis remains incompletely understood, and biomarkers predicting treatment response remain lacking. Here, we describe novel murine LSCC models driven by loss of Trp53 and Keap1, both of which are frequently mutated in human LSCCs. Homozygous inactivation of Keap1 or Trp53 promoted airway basal stem cell (ABSC) self-renewal, suggesting that mutations in these genes lead to expansion of mutant stem cell clones. Deletion of Trp53 and Keap1 in ABSCs, but not more differentiated tracheal cells, produced tumors recapitulating histologic and molecular features of human LSCCs, indicating that they represent the likely cell of origin in this model. Deletion of Keap1 promoted tumor aggressiveness, metastasis, and resistance to oxidative stress and radiotherapy (RT). KEAP1/NRF2 mutation status predicted risk of local recurrence after RT in patients with non-small lung cancer (NSCLC) and could be noninvasively identified in circulating tumor DNA. Thus, KEAP1/NRF2 mutations could serve as predictive biomarkers for personalization of therapeutic strategies for NSCLCs.We developed an LSCC mouse model involving Trp53 and Keap1, which are frequently mutated in human LSCCs. In this model, ABSCs are the cell of origin of these tumors. KEAP1/NRF2 mutations increase radioresistance and predict local tumor recurrence in radiotherapy patients. Our findings are of potential clinical relevance and could lead to personalized treatment strategies for tumors with KEAP1/NRF2 mutations. Cancer Discov; 7(1); 86-101. ©2016 AACR.This article is highlighted in the In This Issue feature, p. 1.
View details for PubMedID 27663899
View details for PubMedCentralID PMC5222718
High-throughput genomic profiling of tumor-infiltrating leukocytes.
Current opinion in immunology
2016; 41: 77-84
Tumors are complex ecosystems comprised of diverse cell types including malignant cells, mesenchymal cells, and tumor-infiltrating leukocytes (TILs). While TILs are well known to play important roles in many aspects of cancer biology, recent developments in immuno-oncology have spurred considerable interest in TILs, particularly in relation to their optimal engagement by emerging immunotherapies. Traditionally, the enumeration of TIL phenotypic diversity and composition in solid tumors has relied on resolving single cells by flow cytometry and immunohistochemical methods. However, advances in genome-wide technologies and computational methods are now allowing TILs to be profiled with increasingly high resolution and accuracy directly from RNA mixtures of bulk tumor samples. In this review, we highlight recent progress in the development of in silico tumor dissection methods, and illustrate examples of how these strategies can be applied to characterize TILs in human tumors to facilitate personalized cancer therapy.
View details for DOI 10.1016/j.coi.2016.06.006
View details for PubMedID 27372732
Circulating tumour DNA profiling reveals heterogeneity of EGFR inhibitor resistance mechanisms in lung cancer patients
Circulating tumour DNA (ctDNA) analysis facilitates studies of tumour heterogeneity. Here we employ CAPP-Seq ctDNA analysis to study resistance mechanisms in 43 non-small cell lung cancer (NSCLC) patients treated with the third-generation epidermal growth factor receptor (EGFR) inhibitor rociletinib. We observe multiple resistance mechanisms in 46% of patients after treatment with first-line inhibitors, indicating frequent intra-patient heterogeneity. Rociletinib resistance recurrently involves MET, EGFR, PIK3CA, ERRB2, KRAS and RB1. We describe a novel EGFR L798I mutation and find that EGFR C797S, which arises in ∼33% of patients after osimertinib treatment, occurs in <3% after rociletinib. Increased MET copy number is the most frequent rociletinib resistance mechanism in this cohort and patients with multiple pre-existing mechanisms (T790M and MET) experience inferior responses. Similarly, rociletinib-resistant xenografts develop MET amplification that can be overcome with the MET inhibitor crizotinib. These results underscore the importance of tumour heterogeneity in NSCLC and the utility of ctDNA-based resistance mechanism assessment.
View details for DOI 10.1038/ncomms11815
View details for Web of Science ID 000378007200001
View details for PubMedID 27283993
View details for PubMedCentralID PMC4906406
Identification of tumorigenic cells and therapeutic targets in pancreatic neuroendocrine tumors
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2016; 113 (16): 4464-4469
Pancreatic neuroendocrine tumors (PanNETs) are a type of pancreatic cancer with limited therapeutic options. Consequently, most patients with advanced disease die from tumor progression. Current evidence indicates that a subset of cancer cells is responsible for tumor development, metastasis, and recurrence, and targeting these tumor-initiating cells is necessary to eradicate tumors. However, tumor-initiating cells and the biological processes that promote pathogenesis remain largely uncharacterized in PanNETs. Here we profile primary and metastatic tumors from an index patient and demonstrate that MET proto-oncogene activation is important for tumor growth in PanNET xenograft models. We identify a highly tumorigenic cell population within several independent surgically acquired PanNETs characterized by increased cell-surface protein CD90 expression and aldehyde dehydrogenase A1 (ALDHA1) activity, and provide in vitro and in vivo evidence for their stem-like properties. We performed proteomic profiling of 332 antigens in two cell lines and four primary tumors, and showed that CD47, a cell-surface protein that acts as a "don't eat me" signal co-opted by cancers to evade innate immune surveillance, is ubiquitously expressed. Moreover, CD47 coexpresses with MET and is enriched in CD90(hi)cells. Furthermore, blocking CD47 signaling promotes engulfment of tumor cells by macrophages in vitro and inhibits xenograft tumor growth, prevents metastases, and prolongs survival in vivo.
View details for DOI 10.1073/pnas.1600007113
View details for Web of Science ID 000374393800063
View details for PubMedID 27035983
View details for PubMedCentralID PMC4843455
Skin fibrosis. Identification and isolation of a dermal lineage with intrinsic fibrogenic potential.
2015; 348 (6232)
Dermal fibroblasts represent a heterogeneous population of cells with diverse features that remain largely undefined. We reveal the presence of at least two fibroblast lineages in murine dorsal skin. Lineage tracing and transplantation assays demonstrate that a single fibroblast lineage is responsible for the bulk of connective tissue deposition during embryonic development, cutaneous wound healing, radiation fibrosis, and cancer stroma formation. Lineage-specific cell ablation leads to diminished connective tissue deposition in wounds and reduces melanoma growth. Using flow cytometry, we identify CD26/DPP4 as a surface marker that allows isolation of this lineage. Small molecule-based inhibition of CD26/DPP4 enzymatic activity during wound healing results in diminished cutaneous scarring. Identification and isolation of these lineages hold promise for translational medicine aimed at in vivo modulation of fibrogenic behavior.
View details for DOI 10.1126/science.aaa2151
View details for PubMedID 25883361
View details for PubMedCentralID PMC5088503
Potential clinical utility of ultrasensitive circulating tumor DNA detection with CAPP-Seq.
Expert review of molecular diagnostics
Tumors continually shed DNA into the circulation, where it can be noninvasively accessed. The ability to accurately detect circulating tumor DNA (ctDNA) could significantly impact the management of patients with nearly every cancer type. Quantitation of ctDNA could allow objective response assessment, detection of minimal residual disease and noninvasive tumor genotyping. The latter application overcomes the barriers currently limiting repeated tumor tissue sampling during therapy. Recent technical advancements have improved upon the sensitivity, specificity and feasibility of ctDNA detection and promise to enable innovative clinical applications. Here, we focus on the potential clinical utility of ctDNA analysis using CAncer Personalized Profiling by deep Sequencing (CAPP-Seq), a novel next-generation sequencing-based approach for ultrasensitive ctDNA detection. Applications of CAPP-Seq for the personalization of cancer detection and therapy are discussed.
View details for DOI 10.1586/14737159.2015.1019476
View details for PubMedID 25773944
Large-Scale and Comprehensive Immune Profiling and Functional Analysis of Normal Human Aging.
2015; 10 (7)
While many age-associated immune changes have been reported, a comprehensive set of metrics of immune aging is lacking. Here we report data from 243 healthy adults aged 40-97, for whom we measured clinical and functional parameters, serum cytokines, cytokines and gene expression in stimulated and unstimulated PBMC, PBMC phenotypes, and cytokine-stimulated pSTAT signaling in whole blood. Although highly heterogeneous across individuals, many of these assays revealed trends by age, sex, and CMV status, to greater or lesser degrees. Age, then sex and CMV status, showed the greatest impact on the immune system, as measured by the percentage of assay readouts with significant differences. An elastic net regression model could optimally predict age with 14 analytes from different assays. This reinforces the importance of multivariate analysis for defining a healthy immune system. These data provide a reference for others measuring immune parameters in older people.
View details for DOI 10.1371/journal.pone.0133627
View details for PubMedID 26197454
In Vivo clonal analysis reveals lineage-restricted progenitor characteristics in Mammalian kidney development, maintenance, and regeneration.
2014; 7 (4): 1270-1283
The mechanism and magnitude by which the mammalian kidney generates and maintains its proximal tubules, distal tubules, and collecting ducts remain controversial. Here, we use long-term in vivo genetic lineage tracing and clonal analysis of individual cells from kidneys undergoing development, maintenance, and regeneration. We show that the adult mammalian kidney undergoes continuous tubulogenesis via expansions of fate-restricted clones. Kidneys recovering from damage undergo tubulogenesis through expansions of clones with segment-specific borders, and renal spheres developing in vitro from individual cells maintain distinct, segment-specific fates. Analysis of mice derived by transfer of color-marked embryonic stem cells (ESCs) into uncolored blastocysts demonstrates that nephrons are polyclonal, developing from expansions of singly fated clones. Finally, we show that adult renal clones are derived from Wnt-responsive precursors, and their tracing in vivo generates tubules that are segment specific. Collectively, these analyses demonstrate that fate-restricted precursors functioning as unipotent progenitors continuously maintain and self-preserve the mouse kidney throughout life.
View details for DOI 10.1016/j.celrep.2014.04.018
View details for PubMedID 24835991
View details for PubMedCentralID PMC4425291
Efficient Selection of Biomineralizing DNA Aptamers Using Deep Sequencing and Population Clustering
2014; 8 (1): 387-395
View details for DOI 10.1021/nn404448s
Identifying Stem Cell Gene Expression Patterns and Phenotypic Networks with AutoSOME.
Methods in molecular biology (Clifton, N.J.)
2014; 1150: 115-130
Stem cells have the unique property of differentiation and self-renewal and play critical roles in normal development, tissue repair, and disease. To promote systems-wide analysis of cells and tissues, we developed AutoSOME, a machine-learning method for identifying coordinated gene expression patterns and correlated cellular phenotypes in whole-transcriptome data, without prior knowledge of cluster number or structure. Here, we present a facile primer demonstrating the use of AutoSOME for identification and characterization of stem cell gene expression signatures and for visualization of transcriptome networks using Cytoscape. This protocol should serve as a general foundation for gene expression cluster analysis of stem cells, with applications for studying pluripotency, multi-lineage potential, and neoplastic disease.
View details for DOI 10.1007/978-1-4939-0512-6_6
View details for PubMedID 24743993
The genome sequence of the colonial chordate, Botryllus schlosseri.
Botryllus schlosseri is a colonial urochordate that follows the chordate plan of development following sexual reproduction, but invokes a stem cell-mediated budding program during subsequent rounds of asexual reproduction. As urochordates are considered to be the closest living invertebrate relatives of vertebrates, they are ideal subjects for whole genome sequence analyses. Using a novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which could be confidently parceled into 13 (of 16 haploid) chromosomes. A comparison of homologous genes between B. schlosseri and other diverse taxonomic groups revealed genomic events underlying the evolution of vertebrates and lymphoid-mediated immunity. The B. schlosseri genome is a community resource for studying alternative modes of reproduction, natural transplantation reactions, and stem cell-mediated regeneration. DOI:http://dx.doi.org/10.7554/eLife.00569.001.
View details for DOI 10.7554/eLife.00569
View details for PubMedID 23840927
Systems-level analysis of age-related macular degeneration reveals global biomarkers and phenotype-specific functional networks
Please see related commentary: http://www.biomedcentral.com/1741-7015/10/21/abstractAge-related macular degeneration (AMD) is a leading cause of blindness that affects the central region of the retinal pigmented epithelium (RPE), choroid, and neural retina. Initially characterized by an accumulation of sub-RPE deposits, AMD leads to progressive retinal degeneration, and in advanced cases, irreversible vision loss. Although genetic analysis, animal models, and cell culture systems have yielded important insights into AMD, the molecular pathways underlying AMD's onset and progression remain poorly delineated. We sought to better understand the molecular underpinnings of this devastating disease by performing the first comparative transcriptome analysis of AMD and normal human donor eyes.RPE-choroid and retina tissue samples were obtained from a common cohort of 31 normal, 26 AMD, and 11 potential pre-AMD human donor eyes. Transcriptome profiles were generated for macular and extramacular regions, and statistical and bioinformatic methods were employed to identify disease-associated gene signatures and functionally enriched protein association networks. Selected genes of high significance were validated using an independent donor cohort.We identified over 50 annotated genes enriched in cell-mediated immune responses that are globally over-expressed in RPE-choroid AMD phenotypes. Using a machine learning model and a second donor cohort, we show that the top 20 global genes are predictive of AMD clinical diagnosis. We also discovered functionally enriched gene sets in the RPE-choroid that delineate the advanced AMD phenotypes, neovascular AMD and geographic atrophy. Moreover, we identified a graded increase of transcript levels in the retina related to wound response, complement cascade, and neurogenesis that strongly correlates with decreased levels of phototransduction transcripts and increased AMD severity. Based on our findings, we assembled protein-protein interactomes that highlight functional networks likely to be involved in AMD pathogenesis.We discovered new global biomarkers and gene expression signatures of AMD. These results are consistent with a model whereby cell-based inflammatory responses represent a central feature of AMD etiology, and depending on genetics, environment, or stochastic factors, may give rise to the advanced AMD phenotypes characterized by angiogenesis and/or cell death. Genes regulating these immunological activities, along with numerous other genes identified here, represent promising new targets for AMD-directed therapeutics and diagnostics.
View details for DOI 10.1186/PREACCEPT-1418491035586234
View details for Web of Science ID 000314566500002
View details for PubMedID 22364233
A proteomic approach for the identification of novel lysine methyltransferase substrates
EPIGENETICS & CHROMATIN
Signaling via protein lysine methylation has been proposed to play a central role in the regulation of many physiologic and pathologic programs. In contrast to other post-translational modifications such as phosphorylation, proteome-wide approaches to investigate lysine methylation networks do not exist.In the current study, we used the ProtoArray® platform, containing over 9,500 human proteins, and developed and optimized a system for proteome-wide identification of novel methylation events catalyzed by the protein lysine methyltransferase (PKMT) SETD6. This enzyme had previously been shown to methylate the transcription factor RelA, but it was not known whether SETD6 had other substrates. By using two independent detection approaches, we identified novel candidate substrates for SETD6, and verified that all targets tested in vitro and in cells were genuine substrates.We describe a novel proteome-wide methodology for the identification of new PKMT substrates. This technological advance may lead to a better understanding of the enzymatic activity and substrate specificity of the large number (more than 50) PKMTs present in the human proteome, most of which are uncharacterized.
View details for DOI 10.1186/1756-8935-4-19
View details for Web of Science ID 000296832600001
View details for PubMedID 22024134
View details for PubMedCentralID PMC3212905
Global Analysis of Proline-Rich Tandem Repeat Proteins Reveals Broad Phylogenetic Diversity in Plant Secretomes
2011; 6 (8)
Cell walls, constructed by precisely choreographed changes in the plant secretome, play critical roles in plant cell physiology and development. Along with structural polysaccharides, secreted proline-rich Tandem Repeat Proteins (TRPs) are important for cell wall function, yet the evolutionary diversity of these structural TRPs remains virtually unexplored. Using a systems-level computational approach to analyze taxonomically diverse plant sequence data, we identified 31 distinct Pro-rich TRP classes targeted for secretion. This analysis expands upon the known phylogenetic diversity of extensins, the most widely studied class of wall structural proteins, and demonstrates that extensins evolved before plant vascularization. Our results also show that most Pro-rich TRP classes have unexpectedly restricted evolutionary distributions, revealing considerable differences in plant secretome signatures that define unexplored diversity.
View details for DOI 10.1371/journal.pone.0023167
View details for Web of Science ID 000293511900032
View details for PubMedID 21829715
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
2011; 12: 436
View details for DOI 10.1186/1471-2105-12-436
AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number
Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry.We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four.By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome webcite.
View details for DOI 10.1186/1471-2105-11-117
View details for Web of Science ID 000276296100002
View details for PubMedID 20202218
XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences
Biological sequence repeats arranged in tandem patterns are widespread in DNA and proteins. While many software tools have been designed to detect DNA tandem repeats (TRs), useful algorithms for identifying protein TRs with varied levels of degeneracy are still needed.To address limitations of current repeat identification methods, and to provide an efficient and flexible algorithm for the detection and analysis of TRs in protein sequences, we designed and implemented a new computational method called XSTREAM. Running time tests confirm the practicality of XSTREAM for analyses of multi-genome datasets. Each of the key capabilities of XSTREAM (e.g., merging, nesting, long-period detection, and TR architecture modeling) are demonstrated using anecdotal examples, and the utility of XSTREAM for identifying TR proteins was validated using data from a recently published paper.We show that XSTREAM is a practical and valuable tool for TR detection in protein and nucleotide sequences at the multi-genome scale, and an effective tool for modeling TR domains with diverse architectures and varied levels of degeneracy. Because of these useful features, XSTREAM has significant potential for the discovery of naturally-evolved modular proteins with applications for engineering novel biostructural and biomimetic materials, and identifying new vaccine and diagnostic targets.
View details for DOI 10.1186/1471-2105-8-382
View details for Web of Science ID 000252936900001
View details for PubMedID 17931424