Clinical Focus

  • Cancer > GI Oncology
  • Medical Oncology
  • Oncology (Cancer)
  • Gastrointestinal Neoplasms
  • Inherited Cancer Disorders
  • Immunotherapy in gastrointestinal cancers

Academic Appointments

Administrative Appointments

  • Senior Associate Director, Stanford Genome Technology Center (2008 - Present)

Honors & Awards

  • Physician-Scientist Fellowship Award, Howard Hughes Medical Institute (1998)
  • American Association Cancer Research, Scholar-in-Training Award for Research Achievement (2005)
  • Merit Award for Research Achievement, American Society Clinical Oncology Foundation (2006)
  • Physician Scientist Early Career Award, Howard Hughes Medical Institute (2008)
  • Clinical Scientist Development Award, Doris Duke Charitable Foundation (2009)
  • Research Scholar Award, American Cancer Society (2013)

Professional Education

  • Residency:University Of Iowa Hospitals and Clinics GME Training Verifications (1996) IA
  • Medical Education:Johns Hopkins University School of Medicine (1994) MD
  • Fellowship:Stanford University Hospital -Clinical Excellence Research Center (2005) CA
  • Board Certification: Medical Oncology, American Board of Internal Medicine (2004)
  • Residency:University of Washington (2001) WA
  • B.A., Reed College, Biology
  • M.D., Johns Hopkins University, Medicine

Current Research and Scholarly Interests

Our research group integrates new molecular technology development, advanced computation methods and genome biology to identify targets for therapy in cancer. We are pursuing projects focused on developing new therapies for stomach, bile duct and colon cancer. We also are involved in study the basis of genomic instability by examining chromosome structure.

Ongoing projects include:

1) Immunogenomic approaches to study cancer's interaction with the immune system and improve our understanding of immunotherapy

2) Identification of kinase interactions which can improve targeted therapy strategies

3) Use of advanced genome sequencing technologies including nanopore sequencers to understand the role of cancer rearrangements in response to therapy

4) Identifying genes that increase the risk of developing cancer

5) Developing new approaches for monitoring cancer from circulating DNA

We are developing new technologies for data storage using DNA technologies.

Clinical Trials

  • Clinical & Pathological Studies of Upper Gastrointestinal Carcinoma Recruiting

    Our research of the biology of upper gastrointestinal cancers involves the study of tissue samples and cells from biopsies of persons with gastric or esophageal cancer or blood samples from upper gastrointestinal cancer patients and persons at high inherited risk for these cancers. We hope to learn the role genes and proteins play in the development of gastric and esophageal cancer.

    View full details

  • The Gastric Cancer Foundation: A Gastric Cancer Registry Recruiting

    The Gastric Cancer Registry will combine data acquired directly from patients with gastric cancer; with a family history of gastric cancer in a first or second degree relative; or persons with a known germline mutation in their CDH1 (E-Cadherin) gene via an online questionnaire with genomic data obtained from saliva, blood and tissue samples. The purpose of this registry is to gain better understanding of the causes of gastric cancer, both environmental and genetic; whether certain genomic data can predict outcomes of treatment and survival.

    View full details

2018-19 Courses

Stanford Advisees

Graduate and Fellowship Programs

All Publications

  • Structural variant analysis for linked-read sequencing data with gemtools. Bioinformatics (Oxford, England) Greer, S. U., Ji, H. P. 2019


    SUMMARY: Linked-read sequencing generates synthetic long reads which are useful for the detection and analysis of structural variants (SVs). The software associated with 10X Genomics linked-read sequencing, Long Ranger, generates the essential output files (BAM, VCF, SV BEDPE) necessary for downstream analyses. However, to perform downstream analyses requires the user to customize their own tools to handle the unique features of linked-read sequencing data. Here, we describe gemtools, a collection of tools for the downstream and in-depth analysis of structural variants from linked-read data. Gemtools uses the barcoded aligned reads and the Megabase-scale phase blocks to determine haplotypes of structural variant breakpoints and delineate complex breakpoint configurations at the resolution of single DNA molecules. The gemtools package is a suite of tools that provides the user with the flexibility to perform basic functions on their linked-read sequencing output in order to address even more questions.AVAILABILITY AND IMPLEMENTATION: The gemtools package is freely available for download at: INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btz239

    View details for PubMedID 30938757

  • Single-cell transcriptome analysis identifies distinct cell types and niche signaling in a primary gastric organoid model. Scientific reports Chen, J., Lau, B. T., Andor, N., Grimes, S. M., Handy, C., Wood-Bouwens, C., Ji, H. P. 2019; 9 (1): 4536


    The diverse cellular milieu of the gastric tissue microenvironment plays a critical role in normal tissue homeostasis and tumor development. However, few cell culture model can recapitulate the tissue microenvironment and intercellular signaling in vitro. We used a primary tissue culture system to generate a murine p53 null gastric tissue model containing both epithelium and mesenchymal stroma. To characterize the microenvironment and niche signaling, we used single cell RNA sequencing (scRNA-Seq) to determine the transcriptomes of 4,391 individual cells. Based on specific markers, we identified epithelial cells, fibroblasts and macrophages in initial tissue explants during organoid formation. The majority of macrophages were polarized towards wound healing and tumor promotion M2-type. During the course of time, the organoids maintained both epithelial and fibroblast lineages with the features of immature mouse gastric stomach. We detected a subset of cells in both lineages expressing Lgr5, one of the stem cell markers. We examined the lineage-specific Wnt signaling activation, and identified that Rspo3 was specifically expressed in the fibroblast lineage, providing an endogenous source of the R-spondin to activate Wnt signaling. Our studies demonstrate that this primary tissue culture system enables one to study gastric tissue niche signaling and immune response in vitro.

    View details for DOI 10.1038/s41598-019-40809-x

    View details for PubMedID 30872643

  • Haplotype-resolved and integrated genome analysis of the cancer cell line HepG2. Nucleic acids research Zhou, B., Ho, S. S., Greer, S. U., Spies, N., Bell, J. M., Zhang, X., Zhu, X., Arthur, J. G., Byeon, S., Pattni, R., Saha, I., Huang, Y., Song, G., Perrin, D., Wong, W. H., Ji, H. P., Abyzov, A., Urban, A. E. 2019


    HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.

    View details for DOI 10.1093/nar/gkz169

    View details for PubMedID 30864654

  • Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562. Genome research Zhou, B., Ho, S. S., Greer, S. U., Zhu, X., Bell, J. M., Arthur, J. G., Spies, N., Zhang, X., Byeon, S., Pattni, R., Ben-Efraim, N., Haney, M. S., Haraksingh, R. R., Song, G., Ji, H. P., Perrin, D., Wong, W. H., Abyzov, A., Urban, A. E. 2019


    K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.

    View details for DOI 10.1101/gr.234948.118

    View details for PubMedID 30737237

  • Covalent 'click chemistry'-based attachment of DNA onto solid phase enables iterative molecular analysis. Analytical chemistry Lau, B. T., Ji, H. P. 2019


    Molecular analysis of DNA samples with limited quantities can be challenging. Repeatedly sequencing the original DNA molecules from a given sample would overcome many issues related to accurate genetic analysis and mitigate issues with processing small amounts of DNA analyte. Moreover, an iterative, replicated analysis of the same DNA molecule has the potential to improve genetic characterization. Herein, we demonstrate that the use of 'click'-based attachment of DNA sequencing libraries onto an agarose bead support enables repetitive primer extension assays for specific genomic DNA targets such as gene exons. We validated the performance of this assay for evaluating specific genetic alterations in both normal and cancer reference standard DNA samples. We demonstrate the stability of conjugated DNA libraries and related sequencing results over the course of independent serial assays spanning several months from the same set of samples. Finally, we finally applied this method to DNA derived from a tumor sample and demonstrated improved mutation detection accuracy.

    View details for DOI 10.1021/acs.analchem.8b05139

    View details for PubMedID 30652472

  • Single-cell RNA-Seq of lymphoma cancers reveals malignant B cell types and co-expression of T cell immune checkpoints. Blood Andor, N., Simonds, E. F., Czerwinski, D. K., Chen, J., Grimes, S. M., Wood-Bouwens, C., Zheng, G. X., Kubit, M. A., Greer, S., Weiss, W. A., Levy, R., Ji, H. P. 2018


    Follicular lymphoma (FL) is a low-grade B cell malignancy that transforms into a highly aggressive and lethal disease at a rate of 2% per year. Perfect isolation of the malignant B cell population from a surgical biopsy is a significant challenge, masking important FL biology, such as immune checkpoint co-expression patterns. To resolve the underlying transcriptional networks of follicular B cell lymphomas we analyzed the transcriptomes of 34,188 cells derived from six primary FL tumors. For each tumor, we identified normal immune subpopulations and malignant B cells based on gene expression. We used multicolor flow cytometry analysis of the same tumors to confirm our assignments of cellular lineages and validate our predictions of expressed proteins. Comparison of gene expression between matched malignant and normal B cells from the same patient revealed tumor-specific features. Malignant B cells exhibited restricted immunoglobulin light chain expression (either Ig Kappa or Ig Lambda), as well the expected upregulation of the BCL2 gene, but also down-regulation of the FCER2, CD52 and MHC class II genes. By analyzing thousands of individual cells per patient tumor, we identified the mosaic of malignant B cell subclones that coexist within a FL and examined the characteristics of tumor-infiltrating T cells. We identified genes co-expressed with immune checkpoint molecules, such as CEBPA and B2M in Tregs, providing a better understanding of the gene networks involved in immune regulation. In summary, parallel measurement of single-cell expression in thousands of tumor cells and tumor-infiltrating lymphocytes can be used to obtain a systems-level view of the tumor microenvironment and identify new avenues for therapeutic development.

    View details for DOI 10.1182/blood-2018-08-862292

    View details for PubMedID 30591526

  • Single Cell RNA Sequencing of Serial Tumor and Blood Biopsies from Lymphoma Patients on an in Situ Vaccination Clinical Trial Shree, T., Sathe, A., Czerwinski, D. K., Long, S. R., Ji, H., Levy, R. AMER SOC HEMATOLOGY. 2018
  • SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution. GigaScience Xia, L. C., Ai, D., Lee, H., Andor, N., Li, C., Zhang, N. R., Ji, H. P. 2018


    Background: Simulating genome sequence data with variant features facilitates the development and benchmarking of structural variant analysis programs. However, there are only a few data simulators that provide structural variants in silico and even fewer that provide variants with different allelic fraction and haplotypes.Findings: We developed SVEngine, an open source tool to address this need. SVEngine simulates next generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs) and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletions, insertions, duplications, inversions and translocations. Finally, SVEngine simulates sequence data that replicates the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time.Conclusions: We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated SVEngine's accuracy by simulating genome-wide structural variants of NA12878 and a heterogenous cancer genome. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift and neighbouring hanging read pairs for representative variant types. Structural variant callers Lumpy and Manta and tumor heterogeneity estimator THetA2 were able to perform realistically on the simulated data. SVEngine is implemented as a standard Python package and is freely available for academic use at:

    View details for DOI 10.1093/gigascience/giy081

    View details for PubMedID 29982625

  • Identification of large rearrangements in cancer genomes with barcode linked reads. Nucleic acids research Xia, L. C., Bell, J. M., Wood-Bouwens, C., Chen, J. J., Zhang, N. R., Ji, H. P. 2018; 46 (4): e19


    Large genomic rearrangements involve inversions, deletions and other structural changes that span Megabase segments of the human genome. This category of genetic aberration is the cause of many hereditary genetic disorders and contributes to pathogenesis of diseases like cancer. We developed a new algorithm called ZoomX for analysing barcode-linked sequence reads-these sequences can be traced to individual high molecular weight DNA molecules (>50 kb). To generate barcode linked sequence reads, we employ a library preparation technology (10X Genomics) that uses droplets to partition and barcode DNA molecules. Using linked read data from whole genome sequencing, we identify large genomic rearrangements, typically greater than 200kb, even when they are only present in low allelic fractions. Our algorithm uses a Poisson scan statistic to identify genomic rearrangement junctions, determine counts of junction-spanning molecules and calculate a Fisher's exact test for determining statistical significance for somatic aberrations. Utilizing a well-characterized human genome, we benchmarked this approach to accurately identify large rearrangement. Subsequently, we demonstrated that our algorithm identifies somatic rearrangements when present in lower allelic fractions as occurs in tumors. We characterized a set of complex cancer rearrangements with multiple classes of structural aberrations and with possible roles in oncogenesis.

    View details for DOI 10.1093/nar/gkx1193

    View details for PubMedID 29186506

    View details for PubMedCentralID PMC5829571

  • Robust Multiplexed Clustering and Denoising of Digital PCR Assays by Data Gridding ANALYTICAL CHEMISTRY Lau, B. T., Wood-Bouwens, C., Ji, H. P. 2017; 89 (22): 11913–17


    Digital PCR (dPCR) relies on the analysis of individual partitions to accurately quantify nucleic acid species. The most widely used analysis method requires manual clustering through individual visual inspection. Some automated analysis methods have emerged but do not robustly account for multiplexed targets, low target concentration, and assay noise. In this study, we describe an open source analysis software called Calico that uses "data gridding" to increase the sensitivity of clustering toward small clusters. Our workflow also generates quality score metrics in order to gauge and filter individual assay partitions by how well they were classified. We applied our analysis algorithm to multiplexed droplet-based digital PCR data sets in both EvaGreen and probes-based schemes, and targeted the oncogenic BRAF V600E and KRAS G12D mutations. We demonstrate an automated clustering sensitivity of down to 0.1% mutant fraction and filtering of artifactual assay partitions from low quality DNA samples. Overall, we demonstrate a vastly improved approach to analyzing ddPCR data that can be applied to clinical use, where automation and reproducibility are critical.

    View details for DOI 10.1021/acs.analchem.7b02688

    View details for Web of Science ID 000416498100006

    View details for PubMedID 29083143

  • Chromosome-scale mega-haplotypes enable digital karyotyping of cancer aneuploidy NUCLEIC ACIDS RESEARCH Bell, J. M., Lau, B. T., Greer, S. U., Wood-Bouwens, C., Xia, L. C., Connolly, I. D., Gephart, M. H., Ji, H. P. 2017; 45 (19): e162


    Genomic instability is a frequently occurring feature of cancer that involves large-scale structural alterations. These somatic changes in chromosome structure include duplication of entire chromosome arms and aneuploidy where chromosomes are duplicated beyond normal diploid content. However, the accurate determination of aneuploidy events in cancer genomes is a challenge. Recent advances in sequencing technology allow the characterization of haplotypes that extend megabases along the human genome using high molecular weight (HMW) DNA. For this study, we employed a library preparation method in which sequence reads have barcodes linked to single HMW DNA molecules. Barcode-linked reads are used to generate extended haplotypes on the order of megabases. We developed a method that leverages haplotypes to identify chromosomal segmental alterations in cancer and uses this information to join haplotypes together, thus extending the range of phased variants. With this approach, we identified mega-haplotypes that encompass entire chromosome arms. We characterized the chromosomal arm changes and aneuploidy events in a manner that offers similar information as a traditional karyotype but with the benefit of DNA sequence resolution. We applied this approach to characterize aneuploidy and chromosomal alterations from a series of primary colorectal cancers.

    View details for DOI 10.1093/nar/gkx712

    View details for Web of Science ID 000414552300001

    View details for PubMedID 28977555

    View details for PubMedCentralID PMC5737808

  • Single molecule counting and assessment of random molecular tagging errors with transposable giga-scale error-correcting barcodes BMC GENOMICS Lau, B. T., Ji, H. P. 2017; 18: 745


    RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule. To address this issue, we demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels.We experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess our method's performance, we applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, we systematically characterized the presence of sequence errors in random-mer molecular barcodes. We observed that such errors are extensive and become more dominant at low input amounts.We described the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant the use of error correcting barcodes for transcriptome analysis as input amounts decrease.

    View details for DOI 10.1186/s12864-017-4141-4

    View details for Web of Science ID 000411432500002

    View details for PubMedID 28934929

    View details for PubMedCentralID PMC5609065

  • Genomic Instability in Cancer: Teetering on the Limit of Tolerance CANCER RESEARCH Andor, N., Maley, C. C., Ji, H. P. 2017; 77 (9): 2179-2185


    Cancer genomic instability contributes to the phenomenon of intratumoral genetic heterogeneity, provides the genetic diversity required for natural selection, and enables the extensive phenotypic diversity that is frequently observed among patients. Genomic instability has previously been associated with poor prognosis. However, we have evidence that for solid tumors of epithelial origin, extreme levels of genomic instability, where more than 75% of the genome is subject to somatic copy number alterations, are associated with a potentially better prognosis compared with intermediate levels under this threshold. This has been observed in clonal subpopulations of larger size, especially when genomic instability is shared among a limited number of clones. We hypothesize that cancers with extreme levels of genomic instability may be teetering on the brink of a threshold where so much of their genome is adversely altered that cells rarely replicate successfully. Another possibility is that tumors with high levels of genomic instability are more immunogenic than other cancers with a less extensive burden of genetic aberrations. Regardless of the exact mechanism, but hinging on our ability to quantify how a tumor's burden of genetic aberrations is distributed among coexisting clones, genomic instability has important therapeutic implications. Herein, we explore the possibility that a high genomic instability could be the basis for a tumor's sensitivity to DNA-damaging therapies. We primarily focus on studies of epithelial-derived solid tumors. Cancer Res; 77(9); 2179-85. ©2017 AACR.

    View details for DOI 10.1158/0008-5472.CAN-16-1553

    View details for Web of Science ID 000400270100001

    View details for PubMedID 28432052

    View details for PubMedCentralID PMC5413432

  • Tandem Oligonucleotide Probe Annealing and Elongation To Discriminate Viral Sequence ANALYTICAL CHEMISTRY Taskova, M., Uhd, J., Miotke, L., Kubit, M., Bell, J., Ji, H. P., Astakhova, K. 2017; 89 (8): 4363-4366


    New approaches for genomic DNA/RNA detection are in high demand in order to provide controls for existing enzymatic technologies and to create alternatives for emerging applications. In particular, there is an unmet need in rapid, reliable detection of short RNA regions which could open up new opportunities in transcriptome analysis, virology, and other fields. Herein, we report for the first time a "click" chemistry approach to oligonucleotide probe elongation as a novel approach to specifically detect a viral sequence. We hybridized a library of short, terminally labeled probes to Ebola virus RNA followed by click assembly and analysis of the read sequence by various techniques. As we demonstrate in this paper, using our new approach, a viral RNA sequence can be detected in less than 2 h without the need for cDNA synthesis or any other enzymatic reactions and with a sensitivity of <10 pM target RNA.

    View details for DOI 10.1021/acs.analchem.7b00646

    View details for Web of Science ID 000399858800008

    View details for PubMedID 28382823

  • CRISPR-Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis NATURE COMMUNICATIONS Shin, G., Grimes, S. M., Lee, H., Lau, B. T., Xia, L. C., Ji, H. P. 2017; 8


    Microsatellites are multi-allelic and composed of short tandem repeats (STRs) with individual motifs composed of mononucleotides, dinucleotides or higher including hexamers. Next-generation sequencing approaches and other STR assays rely on a limited number of PCR amplicons, typically in the tens. Here, we demonstrate STR-Seq, a next-generation sequencing technology that analyses over 2,000 STRs in parallel, and provides the accurate genotyping of microsatellites. STR-Seq employs in vitro CRISPR-Cas9-targeted fragmentation to produce specific DNA molecules covering the complete microsatellite sequence. Amplification-free library preparation provides single molecule sequences without unique molecular barcodes. STR-selective primers enable massively parallel, targeted sequencing of large STR sets. Overall, STR-Seq has higher throughput, improved accuracy and provides a greater number of informative haplotypes compared with other microsatellite analysis approaches. With these new features, STR-Seq can identify a 0.1% minor genome fraction in a DNA mixture composed of different, unrelated samples.

    View details for DOI 10.1038/ncomms14291

    View details for Web of Science ID 000393379700001

    View details for PubMedID 28169275

    View details for PubMedCentralID PMC5309709

  • Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases. Genome medicine Greer, S. U., Nadauld, L. D., Lau, B. T., Chen, J., Wood-Bouwens, C., Ford, J. M., Kuo, C. J., Ji, H. P. 2017; 9 (1): 57


    Genome rearrangements are critical oncogenic driver events in many malignancies. However, the identification and resolution of the structure of cancer genomic rearrangements remain challenging even with whole genome sequencing.To identify oncogenic genomic rearrangements and resolve their structure, we analyzed linked read sequencing. This approach relies on a microfluidic droplet technology to produce libraries derived from single, high molecular weight DNA molecules, 50 kb in size or greater. After sequencing, the barcoded sequence reads provide long range genomic information, identify individual high molecular weight DNA molecules, determine the haplotype context of genetic variants that occur across contiguous megabase-length segments of the genome and delineate the structure of complex rearrangements. We applied linked read sequencing of whole genomes to the analysis of a set of synchronous metastatic diffuse gastric cancers that occurred in the same individual.When comparing metastatic sites, our analysis implicated a complex somatic rearrangement that was present in the metastatic tumor. The oncogenic event associated with the identified complex rearrangement resulted in an amplification of the known cancer driver gene FGFR2. With further investigation using these linked read data, the FGFR2 copy number alteration was determined to be a deletion-inversion motif that underwent tandem duplication, with unique breakpoints in each metastasis. Using a three-dimensional organoid tissue model, we functionally validated the metastatic potential of an FGFR2 amplification in gastric cancer.Our study demonstrates that linked read sequencing is useful in characterizing oncogenic rearrangements in cancer metastasis.

    View details for DOI 10.1186/s13073-017-0447-8

    View details for PubMedID 28629429

    View details for PubMedCentralID PMC5477353

  • Precision Oncology Strategy in Trastuzumab-Resistant Human Epidermal Growth Factor Receptor 2-Positive Colon Cancer: Case Report of Durable Response to Ado-Trastuzumab Emtansine JCO PRECISION ONCOLOGY Haslem, D. S., Ji, H. P., Ford, J. M., Nadauld, L. D. 2017; 1
  • Intestinal Enteroendocrine Lineage Cells Possess Homeostatic and Injury-Inducible Stem Cell Activity Cell Stem Cell Yan, K., Gevaert, O., Zheng, G., Anchang, B., Probert, C., et al 2017; 21 (1): 78 - 90.e6


    Several cell populations have been reported to possess intestinal stem cell (ISC) activity during homeostasis and injury-induced regeneration. Here, we explored inter-relationships between putative mouse ISC populations by comparative RNA-sequencing (RNA-seq). The transcriptomes of multiple cycling ISC populations closely resembled Lgr5+ISCs, the most well-defined ISC pool, but Bmi1-GFP+cells were distinct and enriched for enteroendocrine (EE) markers, including Prox1. Prox1-GFP+cells exhibited sustained clonogenic growth in vitro, and lineage-tracing of Prox1+cells revealed long-lived clones during homeostasis and after radiation-induced injury in vivo. Single-cell mRNA-seq revealed two subsets of Prox1-GFP+cells, one of which resembled mature EE cells while the other displayed low-level EE gene expression but co-expressed tuft cell markers, Lgr5 and Ascl2, reminiscent of label-retaining secretory progenitors. Our data suggest that the EE lineage, including mature EE cells, comprises a reservoir of homeostatic and injury-inducible ISCs, extending our understanding of cellular plasticity and stemness.

    View details for DOI 10.1016/j.stem.2017.06.014

    View details for PubMedCentralID PMC5642297

  • A genome-wide approach for detecting novel insertion-deletion variants of mid-range size. Nucleic acids research Xia, L. C., Sakshuwong, S., Hopmans, E. S., Bell, J. M., Grimes, S. M., Siegmund, D. O., Ji, H. P., Zhang, N. R. 2016; 44 (15)


    We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.

    View details for DOI 10.1093/nar/gkw481

    View details for PubMedID 27325742

    View details for PubMedCentralID PMC5009736

  • Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nature biotechnology Zheng, G. X., Lau, B. T., Schnall-Levin, M., Jarosz, M., Bell, J. M., Hindson, C. M., Kyriazopoulou-Panagiotopoulou, S., Masquelier, D. A., Merrill, L., Terry, J. M., Mudivarti, P. A., Wyatt, P. W., Bharadwaj, R., Makarewicz, A. J., Li, Y., Belgrader, P., Price, A. D., Lowe, A. J., Marks, P., Vurens, G. M., Hardenbol, P., Montesclaros, L., Luo, M., Greenfield, L., Wong, A., Birch, D. E., Short, S. W., Bjornson, K. P., Patel, P., Hopmans, E. S., Wood, C., Kaur, S., Lockwood, G. K., Stafford, D., Delaney, J. P., Wu, I., Ordonez, H. S., Grimes, S. M., Greer, S., Lee, J. Y., Belhocine, K., Giorda, K. M., Heaton, W. H., McDermott, G. P., Bent, Z. W., Meschi, F., Kondov, N. O., Wilson, R., Bernate, J. A., Gauby, S., Kindwall, A., Bermejo, C., Fehr, A. N., Chan, A., Saxonov, S., Ness, K. D., Hindson, B. J., Ji, H. P. 2016; 34 (3): 303-311


    Haplotyping of human chromosomes is a prerequisite for cataloguing the full repertoire of genetic variation. We present a microfluidics-based, linked-read sequencing technology that can phase and haplotype germline and cancer genomes using nanograms of input DNA. This high-throughput platform prepares barcoded libraries for short-read sequencing and computationally reconstructs long-range haplotype and structural variant information. We generate haplotype blocks in a nuclear trio that are concordant with expected inheritance patterns and phase a set of structural variants. We also resolve the structure of the EML4-ALK gene fusion in the NCI-H2228 cancer cell line using phased exome sequencing. Finally, we assign genetic aberrations to specific megabase-scale haplotypes generated from whole-genome sequencing of a primary colorectal adenocarcinoma. This approach resolves haplotype information using up to 100 times less genomic DNA than some methods and enables the accurate detection of structural variants.

    View details for DOI 10.1038/nbt.3432

    View details for PubMedID 26829319

    View details for PubMedCentralID PMC4786454

  • Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nature medicine Andor, N., Graham, T. A., Jansen, M., Xia, L. C., Aktipis, C. A., Petritsch, C., Ji, H. P., Maley, C. C. 2016; 22 (1): 105-113


    Intratumor heterogeneity (ITH) drives neoplastic progression and therapeutic resistance. We used the bioinformatics tools 'expanding ploidy and allele frequency on nested subpopulations' (EXPANDS) and PyClone to detect clones that are present at a ≥10% frequency in 1,165 exome sequences from tumors in The Cancer Genome Atlas. 86% of tumors across 12 cancer types had at least two clones. ITH in the morphology of nuclei was associated with genetic ITH (Spearman's correlation coefficient, ρ = 0.24-0.41; P < 0.001). Mutation of a driver gene that typically appears in smaller clones was a survival risk factor (hazard ratio (HR) = 2.15, 95% confidence interval (CI): 1.71-2.69). The risk of mortality also increased when >2 clones coexisted in the same tumor sample (HR = 1.49, 95% CI: 1.20-1.87). In two independent data sets, copy-number alterations affecting either <25% or >75% of a tumor's genome predicted reduced risk (HR = 0.15, 95% CI: 0.08-0.29). Mortality risk also declined when >4 clones coexisted in the sample, suggesting a trade-off between the costs and benefits of genomic instability. ITH and genomic instability thus have the potential to be useful measures that can universally be applied to all cancers.

    View details for DOI 10.1038/nm.3984

    View details for PubMedID 26618723

  • The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations GENOME MEDICINE Lee, H., Palm, J., Grimes, S. M., Ji, H. P. 2015; 7


    The Cancer Genome Atlas (TCGA) project has generated genomic data sets covering over 20 malignancies. These data provide valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the relationship among TCGA genomic results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. Overcoming this hurdle is an important step toward the wider clinical translation of cancer genomic/proteomic data and implementation of precision cancer medicine. Several websites such as the cBio portal or University of California Santa Cruz genome browser make TCGA data accessible but lack interactive features for querying clinically relevant phenotypic associations with cancer drivers. To enable exploration of the clinical-genomic driver associations from TCGA data, we developed the Cancer Genome Atlas Clinical Explorer.The Cancer Genome Atlas Clinical Explorer interface provides a straightforward platform to query TCGA data using one of the following methods: (1) searching for clinically relevant genes, micro RNAs, and proteins by name, cancer types, or clinical parameters; (2) searching for genomic/proteomic profile changes by clinical parameters in a cancer type; or (3) testing two-hit hypotheses. SQL queries run in the background and results are displayed on our portal in an easy-to-navigate interface according to user's input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression and clinical parameters in the space of multiple genomic/proteomic features provided by TCGA data. Moreover, we identified and ranked gene/micro RNA/protein predictors of each clinical parameter for each cancer. The robustness of the results was estimated by bootstrapping. Overall, we identify associations of potential clinical relevance among genes/micro RNAs/proteins using our statistical analysis from 25 cancer types and 18 clinical parameters that include clinical stage or smoking history.The Cancer Genome Atlas Clinical Explorer enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible web and mobile interface, users can examine queries and test hypothesis regarding genomic/proteomic alterations across a broad spectrum of malignancies.

    View details for DOI 10.1186/s13073-015-0226-3

    View details for Web of Science ID 000363619100002

    View details for PubMedID 26507825

    View details for PubMedCentralID PMC4624593

  • Enzyme-Free Detection of Mutations in Cancer DNA Using Synthetic Oligonucleotide Probes and Fluorescence Microscopy PLOS ONE Miotke, L., Maity, A., Ji, H., Brewer, J., Astakhova, K. 2015; 10 (8)
  • A new multiple feature approach for rapid and highly accurate somatic structural variation discovery from whole cancer genome sequencing Xia, L. C., Bell, J., Chen, J., Zhang, N. R., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2015
  • Allele-specific copy number profiling by next-generation DNA sequencing. Nucleic acids research Chen, H., Bell, J. M., Zavala, N. A., Ji, H. P., Zhang, N. R. 2015; 43 (4)


    The progression and clonal development of tumors often involve amplifications and deletions of genomic DNA. Estimation of allele-specific copy number, which quantifies the number of copies of each allele at each variant loci rather than the total number of chromosome copies, is an important step in the characterization of tumor genomes and the inference of their clonal history. We describe a new method, falcon, for finding somatic allele-specific copy number changes by next generation sequencing of tumors with matched normals. falcon is based on a change-point model on a bivariate mixed Binomial process, which explicitly models the copy numbers of the two chromosome haplotypes and corrects for local allele-specific coverage biases. By using the Binomial distribution rather than a normal approximation, falcon more effectively pools evidence from sites with low coverage. A modified Bayesian information criterion is used to guide model selection for determining the number of copy number events. Falcon is evaluated on in silico spike-in data and applied to the analysis of a pre-malignant colon tumor sample and late-stage colorectal adenocarcinoma from the same individual. The allele-specific copy number estimates obtained by falcon allows us to draw detailed conclusions regarding the clonal history of the individual's colon cancer.

    View details for DOI 10.1093/nar/gku1252

    View details for PubMedID 25477383

    View details for PubMedCentralID PMC4344483

  • Enzyme-Free Detection of Mutations in Cancer DNA Using Synthetic Oligonucleotide Probes and Fluorescence Microscopy. PloS one Miotke, L., Maity, A., Ji, H., Brewer, J., Astakhova, K. 2015; 10 (8)


    Rapid reliable diagnostics of DNA mutations are highly desirable in research and clinical assays. Current development in this field goes simultaneously in two directions: 1) high-throughput methods, and 2) portable assays. Non-enzymatic approaches are attractive for both types of methods since they would allow rapid and relatively inexpensive detection of nucleic acids. Modern fluorescence microscopy is having a huge impact on detection of biomolecules at previously unachievable resolution. However, no straightforward methods to detect DNA in a non-enzymatic way using fluorescence microscopy and nucleic acid analogues have been proposed so far.Here we report a novel enzyme-free approach to efficiently detect cancer mutations. This assay includes gene-specific target enrichment followed by annealing to oligonucleotides containing locked nucleic acids (LNAs) and finally, detection by fluorescence microscopy. The LNA containing probes display high binding affinity and specificity to DNA containing mutations, which allows for the detection of mutation abundance with an intercalating EvaGreen dye. We used a second probe, which increases the overall number of base pairs in order to produce a higher fluorescence signal by incorporating more dye molecules. Indeed we show here that using EvaGreen dye and LNA probes, genomic DNA containing BRAF V600E mutation could be detected by fluorescence microscopy at low femtomolar concentrations. Notably, this was at least 1000-fold above the potential detection limit.Overall, the novel assay we describe could become a new approach to rapid, reliable and enzyme-free diagnostics of cancer or other associated DNA targets. Importantly, stoichiometry of wild type and mutant targets is conserved in our assay, which allows for an accurate estimation of mutant abundance when the detection limit requirement is met. Using fluorescence microscopy, this approach presents the opportunity to detect DNA at single-molecule resolution and directly in the biological sample of choice.

    View details for DOI 10.1371/journal.pone.0136720

    View details for PubMedID 26312489

  • Emergence of Hemagglutinin Mutations During the Course of Influenza Infection. Scientific reports Cushing, A., Kamali, A., Winters, M., Hopmans, E. S., Bell, J. M., Grimes, S. M., Xia, L. C., Zhang, N. R., Moss, R. B., Holodniy, M., Ji, H. P. 2015; 5: 16178-?


    Influenza remains a significant cause of disease mortality. The ongoing threat of influenza infection is partly attributable to the emergence of new mutations in the influenza genome. Among the influenza viral gene products, the hemagglutinin (HA) glycoprotein plays a critical role in influenza pathogenesis, is the target for vaccines and accumulates new mutations that may alter the efficacy of immunization. To study the emergence of HA mutations during the course of infection, we employed a deep-targeted sequencing method. We used samples from 17 patients with active H1N1 or H3N2 influenza infections. These patients were not treated with antivirals. In addition, we had samples from five patients who were analyzed longitudinally. Thus, we determined the quantitative changes in the fractional representation of HA mutations during the course of infection. Across individuals in the study, a series of novel HA mutations directly altered the HA coding sequence were identified. Serial viral sampling revealed HA mutations that either were stable, expanded or were reduced in representation during the course of the infection. Overall, we demonstrated the emergence of unique mutations specific to an infected individual and temporal genetic variation during infection.

    View details for DOI 10.1038/srep16178

    View details for PubMedID 26538451

    View details for PubMedCentralID PMC4633648

  • The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations. Genome medicine Lee, H., Palm, J., Grimes, S. M., Ji, H. P. 2015; 7 (1): 112-?


    The Cancer Genome Atlas (TCGA) project has generated genomic data sets covering over 20 malignancies. These data provide valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the relationship among TCGA genomic results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. Overcoming this hurdle is an important step toward the wider clinical translation of cancer genomic/proteomic data and implementation of precision cancer medicine. Several websites such as the cBio portal or University of California Santa Cruz genome browser make TCGA data accessible but lack interactive features for querying clinically relevant phenotypic associations with cancer drivers. To enable exploration of the clinical-genomic driver associations from TCGA data, we developed the Cancer Genome Atlas Clinical Explorer.The Cancer Genome Atlas Clinical Explorer interface provides a straightforward platform to query TCGA data using one of the following methods: (1) searching for clinically relevant genes, micro RNAs, and proteins by name, cancer types, or clinical parameters; (2) searching for genomic/proteomic profile changes by clinical parameters in a cancer type; or (3) testing two-hit hypotheses. SQL queries run in the background and results are displayed on our portal in an easy-to-navigate interface according to user's input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression and clinical parameters in the space of multiple genomic/proteomic features provided by TCGA data. Moreover, we identified and ranked gene/micro RNA/protein predictors of each clinical parameter for each cancer. The robustness of the results was estimated by bootstrapping. Overall, we identify associations of potential clinical relevance among genes/micro RNAs/proteins using our statistical analysis from 25 cancer types and 18 clinical parameters that include clinical stage or smoking history.The Cancer Genome Atlas Clinical Explorer enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible web and mobile interface, users can examine queries and test hypothesis regarding genomic/proteomic alterations across a broad spectrum of malignancies.

    View details for DOI 10.1186/s13073-015-0226-3

    View details for PubMedID 26507825

  • Single-Color, Multiplexed, Droplet Digital PCR Analysis of the Clinical Significance of Hemizygous Loss of WRN Gene in Colorectal Cancer Lee, H., Lau, B., Zavala, N. A., Ji, H. P. ELSEVIER SCIENCE INC. 2014: 768
  • Oncogenic transformation of diverse gastrointestinal tissues in primary organoid culture NATURE MEDICINE Li, X., Nadauld, L., Ootani, A., Corney, D. C., Pai, R. K., Gevaert, O., Cantrell, M. A., Rack, P. G., Neal, J. T., Chan, C. W., Yeung, T., Gong, X., Yuan, J., Wilhelmy, J., Robine, S., Attardi, L. D., Plevritis, S. K., Hung, K. E., Chen, C., Ji, H. P., Kuo, C. J. 2014; 20 (7): 769-777


    The application of primary organoid cultures containing epithelial and mesenchymal elements to cancer modeling holds promise for combining the accurate multilineage differentiation and physiology of in vivo systems with the facile in vitro manipulation of transformed cell lines. Here we used a single air-liquid interface culture method without modification to engineer oncogenic mutations into primary epithelial and mesenchymal organoids from mouse colon, stomach and pancreas. Pancreatic and gastric organoids exhibited dysplasia as a result of expression of Kras carrying the G12D mutation (Kras(G12D)), p53 loss or both and readily generated adenocarcinoma after in vivo transplantation. In contrast, primary colon organoids required combinatorial Apc, p53, Kras(G12D) and Smad4 mutations for progressive transformation to invasive adenocarcinoma-like histology in vitro and tumorigenicity in vivo, recapitulating multi-hit models of colorectal cancer (CRC), as compared to the more promiscuous transformation of small intestinal organoids. Colon organoid culture functionally validated the microRNA miR-483 as a dominant driver oncogene at the IGF2 (insulin-like growth factor-2) 11p15.5 CRC amplicon, inducing dysplasia in vitro and tumorigenicity in vivo. These studies demonstrate the general utility of a highly tractable primary organoid system for cancer modeling and driver oncogene validation in diverse gastrointestinal tissues.

    View details for DOI 10.1038/nm.3585

    View details for Web of Science ID 000338689500021

  • A programmable method for massively parallel targeted sequencing. Nucleic acids research Hopmans, E. S., Natsoulis, G., Bell, J. M., Grimes, S. M., Sieh, W., Ji, H. P. 2014; 42 (10)


    We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy.

    View details for DOI 10.1093/nar/gku282

    View details for PubMedID 24782526

  • High sensitivity detection and quantitation of DNA copy number and single nucleotide variants with single color droplet digital PCR. Analytical chemistry Miotke, L., Lau, B. T., Rumma, R. T., Ji, H. P. 2014; 86 (5): 2618-2624


    In this study, we present a highly customizable method for quantifying copy number and point mutations utilizing a single-color, droplet digital PCR platform. Droplet digital polymerase chain reaction (ddPCR) is rapidly replacing real-time quantitative PCR (qRT-PCR) as an efficient method of independent DNA quantification. Compared to quantative PCR, ddPCR eliminates the needs for traditional standards; instead, it measures target and reference DNA within the same well. The applications for ddPCR are widespread including targeted quantitation of genetic aberrations, which is commonly achieved with a two-color fluorescent oligonucleotide probe (TaqMan) design. However, the overall cost and need for optimization can be greatly reduced with an alternative method of distinguishing between target and reference products using the nonspecific DNA binding properties of EvaGreen (EG) dye. By manipulating the length of the target and reference amplicons, we can distinguish between their fluorescent signals and quantify each independently. We demonstrate the effectiveness of this method by examining copy number in the proto-oncogene FLT3 and the common V600E point mutation in BRAF. Using a series of well-characterized control samples and cancer cell lines, we confirmed the accuracy of our method in quantifying mutation percentage and integer value copy number changes. As another novel feature, our assay was able to detect a mutation comprising less than 1% of an otherwise wild-type sample, as well as copy number changes from cancers even in the context of significant dilution with normal DNA. This flexible and cost-effective method of independent DNA quantification proves to be a robust alternative to the commercialized TaqMan assay.

    View details for DOI 10.1021/ac403843j

    View details for PubMedID 24483992

  • A phase II study of capecitabine, carboplatin, and bevacizumab for metastatic or unresectable gastroesophageal junction and gastric adenocarcinoma. Kunz, P. L., Nandoskar, P., Koontz, M., Ji, H., Ford, J. M., Balise, R. R., Kamaya, A., Rubin, D., Fisher, G. A. AMER SOC CLINICAL ONCOLOGY. 2014
  • Metastatic tumor evolution and organoid modeling implicate TGFBR2 as a cancer driver in diffuse gastric cancer GENOME BIOLOGY Nadauld, L. D., Garcia, S., Natsoulis, G., Bell, J. M., Miotke, L., Hopmans, E. S., Xu, H., Pai, R. K., Palm, C., Regan, J. F., Chen, H., Flaherty, P., Ootani, A., Zhang, N. R., Ford, J. M., Kuo, C. J., Ji, H. P. 2014; 15 (8)


    Gastric cancer is the second-leading cause of global cancer deaths, with metastatic disease representing the primary cause of mortality. To identify candidate drivers involved in oncogenesis and tumor evolution, we conduct an extensive genome sequencing analysis of metastatic progression in a diffuse gastric cancer. This involves a comparison between a primary tumor from a hereditary diffuse gastric cancer syndrome proband and its recurrence as an ovarian metastasis.Both the primary tumor and ovarian metastasis have common biallelic loss-of-function of both the CDH1 and TP53 tumor suppressors, indicating a common genetic origin. While the primary tumor exhibits amplification of the Fibroblast growth factor receptor 2 (FGFR2) gene, the metastasis notably lacks FGFR2 amplification but rather possesses unique biallelic alterations of Transforming growth factor-beta receptor 2 (TGFBR2), indicating the divergent in vivo evolution of a TGFBR2-mutant metastatic clonal population in this patient. As TGFBR2 mutations have not previously been functionally validated in gastric cancer, we modeled the metastatic potential of TGFBR2 loss in a murine three-dimensional primary gastric organoid culture. The Tgfbr2 shRNA knockdown within Cdh1-/-; Tp53-/- organoids generates invasion in vitro and robust metastatic tumorigenicity in vivo, confirming Tgfbr2 metastasis suppressor activity.We document the metastatic differentiation and genetic heterogeneity of diffuse gastric cancer and reveal the potential metastatic role of TGFBR2 loss-of-function. In support of this study, we apply a murine primary organoid culture method capable of recapitulating in vivo metastatic gastric cancer. Overall, we describe an integrated approach to identify and functionally validate putative cancer drivers involved in metastasis.

    View details for DOI 10.1186/s13059-014-0428-9

    View details for Web of Science ID 000346604100009

    View details for PubMedID 25315765

    View details for PubMedCentralID PMC4145231

  • MendeLIMS: a web-based laboratory information management system for clinical genome sequencing. BMC bioinformatics Grimes, S. M., Ji, H. P. 2014; 15 (1): 290-?

    View details for DOI 10.1186/1471-2105-15-290

    View details for PubMedID 25159034

  • Identification of Insertion Deletion Mutations from Deep Targeted Resequencing. Journal of data mining in genomics & proteomics Natsoulis, G., Zhang, N., Welch, K., Bell, J., Ji, H. P. 2013; 4 (3)


    Taking advantage of the deep targeted sequencing capabilities of next generation sequencers, we have developed a novel two step insertion deletion (indel) detection algorithm (IDA) that can determine indels from single read sequences with high computational efficiency and sensitivity when indels are fractionally less compared to wild type reference sequence. First, it identifies candidate indel positions utilizing specific sequence alignment artifacts produced by rapid alignment programs. Second, it confirms the location of the candidate indel by using the Smith-Waterman (SW) algorithm on a restricted subset of Sequence reads. We demonstrate that IDA is applicable to indels of varying sizes from deep targeted sequencing data at low fractions where the indel is diluted by wild type sequence. Our algorithm is useful in detecting indel variants present at variable allelic frequencies such as may occur in heterozygotes and mixed normal-tumor tissue.

    View details for PubMedID 24511426

    View details for PubMedCentralID PMC3917607

  • RVD: a command-line program for ultrasensitive rare single nucleotide variant detection using targeted next-generation DNA resequencing. BMC research notes Cushing, A., Flaherty, P., Hopmans, E., Bell, J. M., Ji, H. P. 2013; 6: 206-?


    Rare single nucleotide variants play an important role in genetic diversity and heterogeneity of specific human disease. For example, an individual clinical sample can harbor rare mutations at minor frequencies. Genetic diversity within an individual clinical sample is oftentimes reflected in rare mutations. Therefore, detecting rare variants prior to treatment may prove to be a useful predictor for therapeutic response. Current rare variant detection algorithms using next generation DNA sequencing are limited by inherent sequencing error rate and platform availability.Here we describe an optimized implementation of a rare variant detection algorithm called RVD for use in targeted gene resequencing. RVD is available both as a command-line program and for use in MATLAB and estimates context-specific error using a beta-binomial model to call variants with minor allele frequency (MAF) as low as 0.1%. We show that RVD accepts standard BAM formatted sequence files. We tested RVD analysis on multiple Illumina sequencing platforms, among the most widely used DNA sequencing platforms.RVD meets a growing need for highly sensitive and specific tools for variant detection. To demonstrate the usefulness of RVD, we carried out a thorough analysis of the software's performance on synthetic and clinical virus samples sequenced on both an Illumina GAIIx and a MiSeq. We expect RVD can improve understanding the genetics and treatment of common viral diseases including influenza. RVD is available at the following URL:

    View details for DOI 10.1186/1756-0500-6-206

    View details for PubMedID 23701658

  • Systematic genomic identification of colorectal cancer genes delineating advanced from early clinical stage and metastasis. BMC medical genomics Lee, H., Flaherty, P., Ji, H. P. 2013; 6: 54-?


    Colorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis.We compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities.A fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis.We conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression.

    View details for DOI 10.1186/1755-8794-6-54

    View details for PubMedID 24308539


    View details for DOI 10.1214/12-AOAS538

    View details for Web of Science ID 000314457400010

  • Identification of a novel deletion mutant strain in Saccharomyces cerevisiae that results in a microsatellite instability phenotype. BioDiscovery Ji, H. P., Morales, S., Welch, K., Yuen, C., Farnam, K., Ford, J. M. 2012


    The DNA mismatch repair (MMR) pathway corrects specific types of DNA replication errors that affect microsatellites and thus is critical for maintaining genomic integrity. The genes of the MMR pathway are highly conserved across different organisms. Likewise, defective MMR function universally results in microsatellite instability (MSI) which is a hallmark of certain types of cancer associated with the Mendelian disorder hereditary nonpolyposis colorectal cancer. (Lynch syndrome). To identify previously unrecognized deleted genes or loci that can lead to MSI, we developed a functional genomics screen utilizing a plasmid containing a microsatellite sequence that is a host spot for MSI mutations and the comprehensive homozygous diploid deletion mutant resource for Saccharomyces cerevisiae. This pool represents a collection of non-essential homozygous yeast diploid (2N) mutants in which there are deletions for over four thousand yeast open reading frames (ORFs). From our screen, we identified a deletion mutant strain of the PAU24 gene that leads to MSI. In a series of validation experiments, we determined that this PAU24 mutant strain had an increased MSI-specific mutation rate in comparison to the original background wildtype strain, other deletion mutants and comparable to a MMR mutant involving the MLH1 gene. Likewise, in yeast strains with a deletion of PAU24, we identified specific de novo indel mutations that occurred within the targeted microsatellite used for this screen.

    View details for PubMedID 23667739

  • Improving bioinformatic pipelines for exome variant calling GENOME MEDICINE Ji, H. P. 2012; 4


    Exome sequencing analysis is a cost-effective approach for identifying variants in coding regions. However, recognizing the relevant single nucleotide variants, small insertions and deletions remains a challenge for many researchers and diagnostic laboratories typically do not have access to the bioinformatic analysis pipelines necessary for clinical application. The Atlas2 suite, recently released by Baylor Genome Center, is designed to be widely accessible, runs on desktop computers but is scalable to computational clusters, and performs comparably with other popular variant callers. Atlas2 may be an accessible alternative for data processing when a rapid solution for variant calling is required.See research article

    View details for DOI 10.1186/gm306

    View details for Web of Science ID 000314564600001

    View details for PubMedID 22289516

  • The Human OligoGenome Resource: a database of oligonucleotide capture probes for resequencing target regions across the human genome. Nucleic acids research Newburger, D. E., Natsoulis, G., Grimes, S., Bell, J. M., Davis, R. W., Batzoglou, S., Ji, H. P. 2012; 40 (Database issue): D1137-43


    Recent exponential growth in the throughput of next-generation DNA sequencing platforms has dramatically spurred the use of accessible and scalable targeted resequencing approaches. This includes candidate region diagnostic resequencing and novel variant validation from whole genome or exome sequencing analysis. We have previously demonstrated that selective genomic circularization is a robust in-solution approach for capturing and resequencing thousands of target human genome loci such as exons and regulatory sequences. To facilitate the design and production of customized capture assays for any given region in the human genome, we developed the Human OligoGenome Resource ( This online database contains over 21 million capture oligonucleotide sequences. It enables one to create customized and highly multiplexed resequencing assays of target regions across the human genome and is not restricted to coding regions. In total, this resource provides 92.1% in silico coverage of the human genome. The online server allows researchers to download a complete repository of oligonucleotide probes and design customized capture assays to target multiple regions throughout the human genome. The website has query tools for selecting and evaluating capture oligonucleotides from specified genomic regions.

    View details for DOI 10.1093/nar/gkr973

    View details for PubMedID 22102592

  • Performance comparison of whole-genome sequencing platforms NATURE BIOTECHNOLOGY Lam, H. Y., Clark, M. J., Chen, R., Chen, R., Natsoulis, G., O'Huallachain, M., Dewey, F. E., Habegger, L., Ashley, E. A., Gerstein, M. B., Butte, A. J., Ji, H. P., Snyder, M. 2012; 30 (1): 78-U118

    View details for DOI 10.1038/nbt.2065

    View details for Web of Science ID 000299110600023

  • The Human OligoGenome Resource: a database of oligonucleotide capture probes for resequencing target regions across the human genome NUCLEIC ACIDS RESEARCH Newburger, D. E., Natsoulis, G., Grimes, S., Bell, J. M., Davis, R. W., Batzoglou, S., Ji, H. P. 2012; 40 (D1): D1137-D1143

    View details for DOI 10.1093/nar/gkr973

    View details for Web of Science ID 000298601300170

  • A cross-sample statistical model for SNP detection in short-read sequencing data NUCLEIC ACIDS RESEARCH Muralidharan, O., Natsoulis, G., Bell, J., Newburger, D., Xu, H., Kela, I., Ji, H., Zhang, N. 2012; 40 (1)


    Highly multiplex DNA sequencers have greatly expanded our ability to survey human genomes for previously unknown single nucleotide polymorphisms (SNPs). However, sequencing and mapping errors, though rare, contribute substantially to the number of false discoveries in current SNP callers. We demonstrate that we can significantly reduce the number of false positive SNP calls by pooling information across samples. Although many studies prepare and sequence multiple samples with the same protocol, most existing SNP callers ignore cross-sample information. In contrast, we propose an empirical Bayes method that uses cross-sample information to learn the error properties of the data. This error information lets us call SNPs with a lower false discovery rate than existing methods.

    View details for DOI 10.1093/nar/gkr851

    View details for Web of Science ID 000298733500005

    View details for PubMedID 22064853

    View details for PubMedCentralID PMC3245949

  • Quantitative and Sensitive Detection of Cancer Genome Amplifications from Formalin Fixed Paraffin Embedded Tumors with Droplet Digital PCR. Translational medicine (Sunnyvale, Calif.) Nadauld, L., Regan, J. F., Miotke, L., Pai, R. K., Longacre, T. A., Kwok, S. S., Saxonov, S., Ford, J. M., Ji, H. P. 2012; 2 (2)


    For the analysis of cancer, there is great interest in rapid and accurate detection of cancer genome amplifications containing oncogenes that are potential therapeutic targets. The vast majority of cancer tissue samples are formalin fixed and paraffin embedded (FFPE) which enables histopathological examination and long term archiving. However, FFPE cancer genomic DNA is oftentimes degraded and generally a poor substrate for many molecular biology assays. To overcome the issues of poor DNA quality from FFPE samples and detect oncogenic copy number amplifications with high accuracy and sensitivity, we developed a novel approach. Our assay requires nanogram amounts of genomic DNA, thus facilitating study of small amounts of clinical samples. Using droplet digital PCR (ddPCR), we can determine the relative copy number of specific genomic loci even in the presence of intermingled normal tissue. We used a control dilution series to determine the limits of detection for the ddPCR assay and report its improved sensitivity on minimal amounts of DNA compared to standard real-time PCR. To develop this approach, we designed an assay for the fibroblast growth factor receptor 2 gene (FGFR2) that is amplified in a gastric and breast cancers as well as others. We successfully utilized ddPCR to ascertain FGFR2 amplifications from FFPE-preserved gastrointestinal adenocarcinomas.

    View details for PubMedID 23682346

    View details for PubMedCentralID PMC3653435

  • Ultrasensitive detection of rare mutations using next-generation targeted resequencing NUCLEIC ACIDS RESEARCH Flaherty, P., Natsoulis, G., Muralidharan, O., Winters, M., Buenrostro, J., Bell, J., Brown, S., Holodniy, M., Zhang, N., Ji, H. P. 2012; 40 (1)


    With next-generation DNA sequencing technologies, one can interrogate a specific genomic region of interest at very high depth of coverage and identify less prevalent, rare mutations in heterogeneous clinical samples. However, the mutation detection levels are limited by the error rate of the sequencing technology as well as by the availability of variant-calling algorithms with high statistical power and low false positive rates. We demonstrate that we can robustly detect mutations at 0.1% fractional representation. This represents accurate detection of one mutant per every 1000 wild-type alleles. To achieve this sensitive level of mutation detection, we integrate a high accuracy indexing strategy and reference replication for estimating sequencing error variance. We employ a statistical model to estimate the error rate at each position of the reference and to quantify the fraction of variant base in the sample. Our method is highly specific (99%) and sensitive (100%) when applied to a known 0.1% sample fraction admixture of two synthetic DNA samples to validate our method. As a clinical application of this method, we analyzed nine clinical samples of H1N1 influenza A and detected an oseltamivir (antiviral therapy) resistance mutation in the H1N1 neuraminidase gene at a sample fraction of 0.18%.

    View details for DOI 10.1093/nar/gkr861

    View details for Web of Science ID 000298733500002

    View details for PubMedID 22013163

    View details for PubMedCentralID PMC3245950

  • Targeted sequencing library preparation by genomic DNA circularization BMC BIOTECHNOLOGY Myllykangas, S., Natsoulis, G., Bell, J. M., Ji, H. P. 2011; 11


    For next generation DNA sequencing, we have developed a rapid and simple approach for preparing DNA libraries of targeted DNA content. Current protocols for preparing DNA for next-generation targeted sequencing are labor-intensive, require large amounts of starting material, and are prone to artifacts that result from necessary PCR amplification of sequencing libraries. Typically, sample preparation for targeted NGS is a two-step process where (1) the desired regions are selectively captured and (2) the ends of the DNA molecules are modified to render them compatible with any given NGS sequencing platform.In this proof-of-concept study, we present an integrated approach that combines these two separate steps into one. Our method involves circularization of a specific genomic DNA molecule that directly incorporates the necessary components for conducting sequencing in a single assay and requires only one PCR amplification step. We also show that specific regions of the genome can be targeted and sequenced without any PCR amplification.We anticipate that these rapid targeted libraries will be useful for validation of variants and may have diagnostic application.

    View details for DOI 10.1186/1472-6750-11-122

    View details for Web of Science ID 000300427900001

    View details for PubMedID 22168766

  • Efficient targeted resequencing of human germline and cancer genomes by oligonucleotide-selective sequencing NATURE BIOTECHNOLOGY Myllykangas, S., Buenrostro, J. D., Natsoulis, G., Bell, J. M., Ji, H. P. 2011; 29 (11): 1024-U95


    We describe an approach for targeted genome resequencing, called oligonucleotide-selective sequencing (OS-Seq), in which we modify the immobilized lawn of oligonucleotide primers of a next-generation DNA sequencer to function as both a capture and sequencing substrate. We apply OS-Seq to resequence the exons of either 10 or 344 cancer genes from human DNA samples. In our assessment of capture performance, >87% of the captured sequence originated from the intended target region with sequencing coverage falling within a tenfold range for a majority of all targets. Single nucleotide variants (SNVs) called from OS-Seq data agreed with >95% of variants obtained from whole-genome sequencing of the same individual. We also demonstrate mutation discovery from a colorectal cancer tumor sample matched with normal tissue. Overall, we show the robust performance and utility of OS-Seq for the resequencing analysis of human germline and cancer genomes.

    View details for DOI 10.1038/nbt.1996

    View details for Web of Science ID 000296801300024

    View details for PubMedID 22020387

  • A Flexible Approach for Highly Multiplexed Candidate Gene Targeted Resequencing PLOS ONE Natsoulis, G., Bell, J. M., Xu, H., Buenrostro, J. D., Ordonez, H., Grimes, S., Newburger, D., Jensen, M., Zahn, J. M., Zhang, N., Ji, H. P. 2011; 6 (6)


    We have developed an integrated strategy for targeted resequencing and analysis of gene subsets from the human exome for variants. Our capture technology is geared towards resequencing gene subsets substantially larger than can be done efficiently with simplex or multiplex PCR but smaller in scale than exome sequencing. We describe all the steps from the initial capture assay to single nucleotide variant (SNV) discovery. The capture methodology uses in-solution 80-mer oligonucleotides. To provide optimal flexibility in choosing human gene targets, we designed an in silico set of oligonucleotides, the Human OligoExome, that covers the gene exons annotated by the Consensus Coding Sequencing Project (CCDS). This resource is openly available as an Internet accessible database where one can download capture oligonucleotides sequences for any CCDS gene and design custom capture assays. Using this resource, we demonstrated the flexibility of this assay by custom designing capture assays ranging from 10 to over 100 gene targets with total capture sizes from over 100 Kilobases to nearly one Megabase. We established a method to reduce capture variability and incorporated indexing schemes to increase sample throughput. Our approach has multiple applications that include but are not limited to population targeted resequencing studies of specific gene subsets, validation of variants discovered in whole genome sequencing surveys and possible diagnostic analysis of disease gene subsets. We also present a cost analysis demonstrating its cost-effectiveness for large population studies.

    View details for DOI 10.1371/journal.pone.0021088

    View details for Web of Science ID 000292291800008

    View details for PubMedID 21738606

    View details for PubMedCentralID PMC3127857

  • Genetic-based biomarkers and next-generation sequencing: the future of personalized care in colorectal cancer PERSONALIZED MEDICINE Kim, R. Y., Xu, H., Myllykangas, S., Ji, H. 2011; 8 (3): 331-345

    View details for DOI 10.2217/PME.11.16

    View details for Web of Science ID 000291444800013

  • Identification of Novel LNK Mutations In Patients with Chronic Myeloproliferative Neoplasms and Related Disorders 52nd Annual Meeting and Exposition of the American-Society-of-Hematology (ASH) Oh, S. T., Zahn, J. M., Jones, C. D., Zhang, B., Loh, M. L., Kantarjian, H., Simonds, E. F., Bruggner, R. V., Abidi, P., Natsoulis, G., Bell, J., Buenrostro, J., Nolan, G. P., Zehnder, J. L., Ji, H. P., Gotlib, J. AMER SOC HEMATOLOGY. 2010: 143–44
  • Detecting simultaneous changepoints in multiple sequences BIOMETRIKA Zhang, N. R., Siegmund, D. O., Ji, H., Li, J. Z. 2010; 97 (3): 631-645


    We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.

    View details for DOI 10.1093/biomet/asq025

    View details for Web of Science ID 000280904000008

    View details for PubMedCentralID PMC3372242

  • Oncogenic BRAF Mutation with CDKN2A Inactivation Is Characteristic of a Subset of Pediatric Malignant Astrocytomas CANCER RESEARCH Schiffman, J. D., Hodgson, J. G., VandenBerg, S. R., Flaherty, P., Polley, M. C., Yu, M., Fisher, P. G., Rowitch, D. H., Ford, J. M., Berger, M. S., Ji, H., Gutmann, D. H., James, C. D. 2010; 70 (2): 512-519


    Malignant astrocytomas are a deadly solid tumor in children. Limited understanding of their underlying genetic basis has contributed to modest progress in developing more effective therapies. In an effort to identify such alterations, we performed a genome-wide search for DNA copy number aberrations (CNA) in a panel of 33 tumors encompassing grade 1 through grade 4 tumors. Genomic amplifications of 10-fold or greater were restricted to grade 3 and 4 astrocytomas and included the MDM4 (1q32), PDGFRA (4q12), MET (7q21), CMYC (8q24), PVT1 (8q24), WNT5B (12p13), and IGF1R (15q26) genes. Homozygous deletions of CDKN2A (9p21), PTEN (10q26), and TP53 (17p3.1) were evident among grade 2 to 4 tumors. BRAF gene rearrangements that were indicated in three tumors prompted the discovery of KIAA1549-BRAF fusion transcripts expressed in 10 of 10 grade 1 astrocytomas and in none of the grade 2 to 4 tumors. In contrast, an oncogenic missense BRAF mutation (BRAF(V600E)) was detected in 7 of 31 grade 2 to 4 tumors but in none of the grade 1 tumors. BRAF(V600E) mutation seems to define a subset of malignant astrocytomas in children, in which there is frequent concomitant homozygous deletion of CDKN2A (five of seven cases). Taken together, these findings highlight BRAF as a frequent mutation target in pediatric astrocytomas, with distinct types of BRAF alteration occurring in grade 1 versus grade 2 to 4 tumors.

    View details for DOI 10.1158/0008-5472.CAN-09-1851

    View details for Web of Science ID 000278485500011

    View details for PubMedID 20068183

  • Targeted deep resequencing of the human cancer genome using next-generation technologies BIOTECHNOLOGY AND GENETIC ENGINEERING REVIEWS, VOL 27 Myllykangas, S., Ji, H. P. 2010; 27: 135-158


    Next-generation sequencing technologies have revolutionized our ability to identify genetic variants, either germline or somatic point mutations, that occur in cancer. Parallelization and miniaturization of DNA sequencing enables massive data throughput and for the first time, large-scale, nucleotide resolution views of cancer genomes can be achieved. Systematic, large-scale sequencing surveys have revealed that the genetic spectrum of mutations in cancers appears to be highly complex with numerous low frequency bystander somatic variations, and a limited number of common, frequently mutated genes. Large sample sizes and deeper resequencing are much needed in resolving clinical and biological relevance of the mutations as well as in detecting somatic variants in heterogeneous samples and cancer cell sub-populations. However, even with the next-generation sequencing technologies, the overwhelming size of the human genome and need for very high fold coverage represents a major challenge for up-scaling cancer genome sequencing projects. Assays to target, capture, enrich or partition disease-specific regions of the genome offer immediate solutions for reducing the complexity of the sequencing libraries. Integration of targeted DNA capture assays and next-generation deep resequencing improves the ability to identify clinically and biologically relevant mutations.

    View details for Web of Science ID 000286179900006

    View details for PubMedID 21415896

  • Identification of a biomarker panel using a multiplex proximity ligation assay improves accuracy of pancreatic cancer diagnosis JOURNAL OF TRANSLATIONAL MEDICINE Chang, S. T., Zahn, J. M., Horecka, J., Kunz, P. L., Ford, J. M., Fisher, G. A., Le, Q. T., Chang, D. T., Ji, H., Koong, A. C. 2009; 7


    Pancreatic cancer continues to prove difficult to clinically diagnose. Multiple simultaneous measurements of plasma biomarkers can increase sensitivity and selectivity of diagnosis. Proximity ligation assay (PLA) is a highly sensitive technique for multiplex detection of biomarkers in plasma with little or no interfering background signal.We examined the plasma levels of 21 biomarkers in a clinically defined cohort of 52 locally advanced (Stage II/III) pancreatic ductal adenocarcinoma cases and 43 age-matched controls using a multiplex proximity ligation assay. The optimal biomarker panel for diagnosis was computed using a combination of the PAM algorithm and logistic regression modeling. Biomarkers that were significantly prognostic for survival in combination were determined using univariate and multivariate Cox survival models.Three markers, CA19-9, OPN and CHI3L1, measured in multiplex were found to have superior sensitivity for pancreatic cancer vs. CA19-9 alone (93% vs. 80%). In addition, we identified two markers, CEA and CA125, that when measured simultaneously have prognostic significance for survival for this clinical stage of pancreatic cancer (p < 0.003).A multiplex panel assaying CA19-9, OPN and CHI3L1 in plasma improves accuracy of pancreatic cancer diagnosis. A panel assaying CEA and CA125 in plasma can predict survival for this clinical cohort of pancreatic cancer patients.

    View details for DOI 10.1186/1479-5876-7-105

    View details for Web of Science ID 000272889900001

    View details for PubMedID 20003342

    View details for PubMedCentralID PMC2796647

  • ASSOCIATION OF 7Q34 COPY NUMBER GAINS AND KIAA1549-BRAF GENE FUSIONS WITH JUVENILE PILOCYTIC ASTROCYTOMA Hodgson, J., VandenBerg, S. R., James, C., Perry, A., Gutmann, D., Fisher, P., Ford, J., Ji, H., Schiffman, J. OXFORD UNIV PRESS INC. 2009: 960
  • Molecular inversion probes reveal patterns of 9p21 deletion and copy number aberrations in childhood leukemia CANCER GENETICS AND CYTOGENETICS Schiffman, J. D., Wang, Y., McPherson, L. A., Welch, K., Zhang, N., Davis, R., Lacayo, N. J., Dahl, G. V., Faham, M., Ford, J. M., Ji, H. P. 2009; 193 (1): 9-18


    Childhood leukemia, which accounts for >30% of newly diagnosed childhood malignancies, is one of the leading causes of death for children with cancer. Genome-wide studies using microarray chips to identify copy number changes in human cancer are becoming more common. In this pilot study, 45 pediatric leukemia samples were analyzed for gene copy aberrations using novel molecular inversion probe (MIP) technology. Acute leukemia subtypes included precursor B-cell acute lymphoblastic leukemia (ALL) (n=23), precursor T-cell ALL (n=6), and acute myeloid leukemia (n=14). The MIP analysis identified 69 regions of recurring copy number changes, of which 41 have not been identified with other DNA microarray platforms. Copy number gains and losses were validated in 98% of clinical karyotypes and 100% of fluorescence in situ hybridization studies available. We report unique patterns of copy number loss in samples with 9p21.3 (CDKN2A) deletion in the precursor B-cell ALL patients, compared with the precursor T-cell ALL patients. MIPs represent an attractive technology for identifying novel copy number aberrations, validating previously reported copy number changes, and translating molecular findings into clinically relevant targets for further investigation.

    View details for DOI 10.1016/j.cancergencyto.2009.03.005

    View details for Web of Science ID 000268922900002

    View details for PubMedID 19602459

    View details for PubMedCentralID PMC2776674

  • Paired phospho-proteomic and genomic analyses reveal functionally distinct subclones in refractory pediatric acute myeloid leukemia Simonds, E., Schiffman, J., Gramatges, M., Dahl, G., Ford, J., Lacayo, N., Ji, H., Nolan, G. AMER ASSOC CANCER RESEARCH. 2009
  • Disperse-a software system for design of selector probes for exon resequencing applications BIOINFORMATICS Stenberg, J., Zhang, M., Ji, H. 2009; 25 (5): 666-667


    Selector probes enable the amplification of many selected regions of the genome in multiplex. Disperse is a software pipeline that automates the procedure of designing selector probes for exon resequencing applications.Software and documentation is available at

    View details for DOI 10.1093/bioinformatics/btp001

    View details for Web of Science ID 000263834600018

    View details for PubMedID 19158162

  • Molecular inversion probe assay for allelic quantitation. Methods in molecular biology (Clifton, N.J.) Ji, H., Welch, K. 2009; 556: 67-87


    Molecular inversion probe (MIP) technology has been demonstrated to be a robust platform for large-scale dual genotyping and copy number analysis. Applications in human genomic and genetic studies include the possibility of running dual germline genotyping and combined copy number variation ascertainment. MIPs analyze large numbers of specific genetic target sequences in parallel, relying on interrogation of a barcode tag, rather than direct hybridization of genomic DNA to an array. The MIP approach does not replace, but is complementary to many of the copy number technologies being performed today. Some specific advantages of MIP technology include: less DNA required (37 ng vs. 250 ng), DNA quality less important, more dynamic range (amplifications detected up to copy number 60), allele-specific information "cleaner" (less SNP cross-talk/contamination), and quality of markers better (fewer individual MIPs versus SNPs needed to identify copy number changes). MIPs can be considered a candidate gene (targeted whole genome) approach and can find specific areas of interest that otherwise may be missed with other methods.

    View details for DOI 10.1007/978-1-60327-192-9_6

    View details for PubMedID 19488872

  • Next-generation DNA sequencing NATURE BIOTECHNOLOGY Shendure, J., Ji, H. 2008; 26 (10): 1135-1145


    DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-throughput data collection. Over the past three years, massively parallel DNA sequencing platforms have become widely available, reducing the cost of DNA sequencing by over two orders of magnitude, and democratizing the field by putting the sequencing capacity of a major genome center in the hands of individual investigators. These new technologies are rapidly evolving, and near-term challenges include the development of robust protocols for generating sequencing libraries, building effective new approaches to data-analysis, and often a rethinking of experimental design. Next-generation DNA sequencing has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant production-scale efforts.

    View details for DOI 10.1038/nbt1486

    View details for Web of Science ID 000259926000028

    View details for PubMedID 18846087

  • FOXM1 OVEREXPRESSION AND DNA AMPLIFICATION IN PEDIATRIC ASTROCYTOMAS Hodgson, G., Vandenberg, S., Fisher, P., Yu, M., James, C., Rowitch, D., Ford, J., Ji, H., Schiffman, J. OXFORD UNIV PRESS INC. 2008: 805–6
  • Analysis of Genomic Instability in Colorectal Carcinoma Flaherty, P., Davis, R. W., Ji, H. FEDERATION AMER SOC EXP BIOL. 2008
  • Gene-specific delineation of copy number aberrations in follicular lymphoma with molecular inversion probes 49th Annual Meeting of the American-Society-of-Hematology Ji, H. P., Welch, K. M., Wang, Y., Faham, M., Akasaka, T., Czerwinski, D., Davis, R. W., Levy, R. AMER SOC HEMATOLOGY. 2007: 766A–767A
  • Molecular Inversion Probes (MIPs) identify novel areas of allelic imbalance in childhood leukemia Schiffman, J. D., Welch, K., Davis, R., Lacayo, N. J., Dahl, G. V., Wang, Y., Faham, M., Ford, J. M., Ji, H. P. AMER SOC HEMATOLOGY. 2007: 431A
  • Adapting molecular inversion probe (MIP) technology for allele quantification in childhood leukemia Schiffman, J. D., Welch, K. M., Davis, R., Dahl, G. V., Lacayo, N. J., Faham, M., Ford, J. M., Ji, H. AMER SOC CLINICAL ONCOLOGY. 2007
  • Multigene amplification and massively parallel sequencing for cancer mutation discovery PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Dahl, F., Stenberg, J., Fredriksson, S., Welch, K., Zhang, M., Nilsson, M., Bicknell, D., Bodmer, W. F., Davis, R. W., Ji, H. 2007; 104 (22): 9387-9392


    We have developed a procedure for massively parallel resequencing of multiple human genes by combining a highly multiplexed and target-specific amplification process with a high-throughput parallel sequencing technology. The amplification process is based on oligonucleotide constructs, called selectors, that guide the circularization of specific DNA target regions. Subsequently, the circularized target sequences are amplified in multiplex and analyzed by using a highly parallel sequencing-by-synthesis technology. As a proof-of-concept study, we demonstrate parallel resequencing of 10 cancer genes covering 177 exons with average sequence coverage per sample of 93%. Seven cancer cell lines and one normal genomic DNA sample were studied with multiple mutations and polymorphisms identified among the 10 genes. Mutations and polymorphisms in the TP53 gene were confirmed by traditional sequencing.

    View details for DOI 10.1073/pnas.0702165104

    View details for Web of Science ID 000246935700055

    View details for PubMedID 17517648

    View details for PubMedCentralID PMC1871563

  • Multiplex amplification of all coding sequences within 10 cancer genes by Gene-Collector NUCLEIC ACIDS RESEARCH Fredriksson, S., Baner, J., Dahl, F., Chu, A., Ji, H., Welch, K., Davis, R. W. 2007; 35 (7)


    Herein we present Gene-Collector, a method for multiplex amplification of nucleic acids. The procedure has been employed to successfully amplify the coding sequence of 10 human cancer genes in one assay with uniform abundance of the final products. Amplification is initiated by a multiplex PCR in this case with 170 primer pairs. Each PCR product is then specifically circularized by ligation on a Collector probe capable of juxtapositioning only the perfectly matched cognate primer pairs. Any amplification artifacts typically associated with multiplex PCR derived from the use of many primer pairs such as false amplicons, primer-dimers etc. are not circularized and degraded by exonuclease treatment. Circular DNA molecules are then further enriched by randomly primed rolling circle replication. Amplification was successful for 90% of the targeted amplicons as seen by hybridization to a custom resequencing DNA micro-array. Real-time quantitative PCR revealed that 96% of the amplification products were all within 4-fold of the average abundance. Gene-Collector has utility for numerous applications such as high throughput resequencing, SNP analyses, and pathogen detection.

    View details for DOI 10.1093/nar/gkm078

    View details for Web of Science ID 000246294700001

    View details for PubMedID 17317684

    View details for PubMedCentralID PMC1874629

  • Multiplexed protein detection by proximity ligation for cancer biomarker validation NATURE METHODS Fredriksson, S., Dixon, W., Ji, H., Koong, A. C., Mindrinos, M., Davis, R. W. 2007; 4 (4): 327-329


    We present a proximity ligation-based multiplexed protein detection procedure in which several selected proteins can be detected via unique nucleic-acid identifiers and subsequently quantified by real-time PCR. The assay requires a 1-microl sample, has low-femtomolar sensitivity as well as five-log linear range and allows for modular multiplexing without cross-reactivity. The procedure can use a single polyclonal antibody batch for each target protein, simplifying affinity-reagent creation for new biomarker candidates.

    View details for DOI 10.1038/NMETH1020

    View details for Web of Science ID 000245584900013

    View details for PubMedID 17369836

  • Under-expression of Kalirin-7 increases iNOS activity in cultured cells and correlates to elevated iNOS activity in Alzheimer's disease hippocampus JOURNAL OF ALZHEIMERS DISEASE Youn, H., Ji, I., Ji, H. P., Markesbery, W. R., Ji, T. H. 2007; 12 (3): 271-281


    Recently, it has been reported that Kalirin gene transcripts are under-expressed in AD hippocampal specimens compared to the controls. The Kalirin gene generates a dozen Kalirin isoforms. Kalirin-7 is the predominant protein expressed in the adult brain and plays crucial roles in growth and maintenance of neurons. Yet its role in human diseases is unknown. We report that Kalirin-7 is significantly diminished both at the mRNA and protein levels in the hippocampus specimens from 19 AD patients compared to the specimens from 15 controls. Kalirin-7 associates with iNOS in the hippocampus, and therefore, Kalirin-7 is complexed with iNOS less in AD hippocampus extracts than in control hippocampus extracts. In cultured cells, Kalirin-7 associates with iNOS and down-regulates the enzyme activity. The down-regulation is attributed to the highly conserved 33 amino acid sequence, K(617) -H(649), of the 1,663 amino acids long Kalirin-7. Remarkably, the iNOS activity is considerably higher in the hippocampus specimens from AD patients than the specimens from 15 controls. These observations suggest that the under-expression of Kalirin-7 in AD hippocampus correlates to the elevated iNOS activity.

    View details for Web of Science ID 000252300000009

    View details for PubMedID 18057561

  • Reproducibility Probability Score - incorporating measurement variability across laboratories for gene selection NATURE BIOTECHNOLOGY Lin, G., He, X., Ji, H., Shi, L., Davis, R. W., Zhong, S. 2006; 24 (12): 1476-1477

    View details for Web of Science ID 000242795800015

    View details for PubMedID 17160039

  • Data quality in genomics and microarrays NATURE BIOTECHNOLOGY Ji, H., Davis, R. W. 2006; 24 (9): 1112-1113

    View details for DOI 10.1038/nbt0906-1108

    View details for Web of Science ID 000240495200031

    View details for PubMedID 16964224

    View details for PubMedCentralID PMC2943412

  • The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements NATURE BIOTECHNOLOGY Shi, L., Reid, L. H., Jones, W. D., Shippy, R., Warrington, J. A., Baker, S. C., Collins, P. J., de Longueville, F., Kawasaki, E. S., Lee, K. Y., Luo, Y., Sun, Y. A., Willey, J. C., Setterquist, R. A., Fischer, G. M., Tong, W., Dragan, Y. P., Dix, D. J., Frueh, F. W., Goodsaid, F. M., Herman, D., Jensen, R. V., Johnson, C. D., Lobenhofer, E. K., Puri, R. K., Scherf, U., Thierry-Mieg, J., Wang, C., Wilson, M., Wolber, P. K., Zhang, L., Amur, S., Bao, W., Barbacioru, C. C., Lucas, A. B., Bertholet, V., Boysen, C., Bromley, B., Brown, D., Brunner, A., Canales, R., Cao, X. M., Cebula, T. A., Chen, J. J., Cheng, J., Chu, T., Chudin, E., Corson, J., Corton, J. C., Croner, L. J., Davies, C., Davison, T. S., Delenstarr, G., Deng, X., Dorris, D., Eklund, A. C., Fan, X., Fang, H., Fulmer-Smentek, S., Fuscoe, J. C., Gallagher, K., Ge, W., Guo, L., Guo, X., Hager, J., Haje, P. K., Han, J., Han, T., Harbottle, H. C., Harris, S. C., Hatchwell, E., Hauser, C. A., Hester, S., Hong, H., Hurban, P., Jackson, S. A., Ji, H., Knight, C. R., Kuo, W. P., LeClerc, J. E., Levy, S., Li, Q., Liu, C., Liu, Y., Lombardi, M. J., Ma, Y., Magnuson, S. R., Maqsodi, B., McDaniel, T., Mei, N., Myklebost, O., Ning, B., Novoradovskaya, N., Orr, M. S., Osborn, T. W., Papallo, A., Patterson, T. A., Perkins, R. G., Peters, E. H., Peterson, R., Philips, K. L., Pine, P. S., Pusztai, L., Qian, F., Ren, H., Rosen, M., Rosenzweig, B. A., Samaha, R. R., Schena, M., Schroth, G. P., Shchegrova, S., Smith, D. D., Staedtler, F., Su, Z., Sun, H., Szallasi, Z., Tezak, Z., Thierry-Mieg, D., Thompson, K. L., Tikhonova, I., Turpaz, Y., Vallanat, B., Van, C., Walker, S. J., Wang, S. J., Wang, Y., Wolfinger, R., Wong, A., Wu, J., Xiao, C., Xie, Q., Xu, J., Yang, W., Zhang, L., Zhong, S., Zong, Y., Slikker, W. 2006; 24 (9): 1151-1161


    Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource that represents an important first step toward establishing a framework for the use of microarrays in clinical and regulatory settings.

    View details for DOI 10.1038/nbt1239

    View details for Web of Science ID 000240495200036

    View details for PubMedID 16964229

    View details for PubMedCentralID PMC3272078

  • Molecular inversion probe analysis of gene copy alterations reveals distinct categories of colorectal carcinoma CANCER RESEARCH Ji, H., Kumm, J., Zhang, M., Farnam, K., Salari, K., Faham, M., Ford, J. M., Davis, R. W. 2006; 66 (16): 7910-7919


    Genomic instability is a major feature of neoplastic development in colorectal carcinoma and other cancers. Specific genomic instability events, such as deletions in chromosomes and other alterations in gene copy number, have potential utility as biologically relevant prognostic biomarkers. For example, genomic deletions on chromosome arm 18q are an indicator of colorectal carcinoma behavior and potentially useful as a prognostic indicator. Adapting a novel genomic technology called molecular inversion probes which can determine gene copy alterations, such as genomic deletions, we designed a set of probes to interrogate several hundred individual exons of >200 cancer genes with an overall distribution covering all chromosome arms. In addition, >100 probes were designed in close proximity of microsatellite markers on chromosome arm 18q. We analyzed a set of colorectal carcinoma cell lines and primary colorectal tumor samples for gene copy alterations and deletion mutations in exons. Based on clustering analysis, we distinguished the different categories of genomic instability among the colorectal cancer cell lines. Our analysis of primary tumors uncovered several distinct categories of colorectal carcinoma, each with specific patterns of 18q deletions and deletion mutations in specific genes. This finding has potential clinical ramifications given the application of 18q loss of heterozygosity events as a potential indicator for adjuvant treatment in stage II colorectal carcinoma.

    View details for DOI 10.1158/0008-5472.CAN-06-0595

    View details for Web of Science ID 000239828200013

    View details for PubMedID 16912164

    View details for PubMedCentralID PMC2943417

  • Analysis of genomic DNA copy number alterations in chromosome arm 18q demonstrates distinct molecular categories of colorectal carcinoma. Ji, H., Zhang, M., Farnam, K., Salari, K., Davis, R., Ford, J. M. AMER SOC CLINICAL ONCOLOGY. 2006: 542S
  • A functional assay for mutations in tumor suppressor genes caused by mismatch repair deficiency HUMAN MOLECULAR GENETICS Ji, H. P., King, M. C. 2001; 10 (24): 2737-2743


    The coding sequences of multiple human tumor suppressor genes include microsatellite sequences that are prone to mutations. Saccharomyces cerevisiae strains deficient in DNA mismatch repair (MMR) can be used to determine de novo mutation rates of these human tumor suppressor genes as well as any other gene sequence. Microsatellites in human TGFBR2, PTEN and APC genes were placed in yeast vectors and analyzed in isogenic yeast strains that were wild-type or deletion mutants for MSH2 or MLH1. In MMR-deficient strains, the vector containing the (A)(10) microsatellite sequence of TGFBR2 had a mutation rate (mutations/cell division) of 1.4 x 10(-4), compared to a mutation rate of 1.7 x 10(-6) in the wild-type strain. In MMR-deficient strains, mutation rates in PTEN and APC were also elevated above background levels. PTEN mutation rates were higher in both msh2 (4.4 x 10-5) and mlh1 strains (2.3 x 10-5). APC mutation rates in the msh2 strain (2.4 x 10-6) and the mlh1 strain (1.7 x 10-6) were also significantly, but less dramatically, elevated over background. Mutations selected for in the yeast screen were identical to those previously observed in human tumor samples with microsatellite instability (MSI). This functional assay has applicability in providing quantitative data about microsatellite mutation rates caused by MMR deficiency in any human tumor suppressor gene sequence. It can also be applied as a genetic screen to identify new genes that are vulnerable to such microsatellite mutations and thus may be involved in the neoplastic development of tumors with MSI.

    View details for Web of Science ID 000172867500001

    View details for PubMedID 11734538

  • Spondyloepimetaphyseal dysplasia with joint laxity (SEMDJL): Presentation in two unrelated patients in the United States AMERICAN JOURNAL OF MEDICAL GENETICS Smith, W., Ji, H. L., Mouradian, W., Pagon, R. A. 1999; 86 (3): 245-252


    This is a report of two North American patients with spondyloepimetaphyseal dysplasia with joint laxity, an uncommon autosomal recessive skeletal dysplasia rarely reported outside of South Africa. Patients with SEMDJL have vertebral abnormalities and ligamentous laxity that results in spinal misalignment and progressive severe kyphoscoliosis, thoracic asymmetry, and respiratory compromise resulting in early death. Nonaxial skeletal involvement includes elbow deformities with radial head dislocation, dislocated hips, clubbed feet, and tapered fingers with spatulate distal phalanges. Many affected children have an oval face, flat midface, prominent eyes with blue sclerae, and a long philtrum. Palatal abnormalities and congenital heart disease are also observed. Diagnosis in infancy may be difficult because many of the typical findings are not apparent early and only evolve over time. We review the physical and radiographic findings in two unrelated patients with this disorder in order to increase the awareness of this disorder, particularly for clinicians outside of South Africa.

    View details for Web of Science ID 000082714300010

    View details for PubMedID 10482874

  • Molecular classification of the inherited hamartoma polyposis syndromes: Clearing the muddied waters AMERICAN JOURNAL OF HUMAN GENETICS Eng, C., Ji, H. L. 1998; 62 (5): 1020-1022

    View details for Web of Science ID 000073487000004

    View details for PubMedID 9545417

  • Inherited mutations in PTEN that are associated with breast cancer, Cowden disease, and juvenile polyposis AMERICAN JOURNAL OF HUMAN GENETICS Lynch, E. D., OSTERMEYER, E. A., Lee, M. K., Arena, J. F., Ji, H. L., Dann, J., Swisshelm, K., Suchard, D., MACLEOD, P. M., KVINNSLAND, S., Gjertsen, B. T., Heimdal, K., Lubs, H., Moller, P., KING, M. C. 1997; 61 (6): 1254-1260


    PTEN, a protein tyrosine phosphatase with homology to tensin, is a tumor-suppressor gene on chromosome 10q23. Somatic mutations in PTEN occur in multiple tumors, most markedly glioblastomas. Germ-line mutations in PTEN are responsible for Cowden disease (CD), a rare autosomal dominant multiple-hamartoma syndrome. PTEN was sequenced from constitutional DNA from 25 families. Germ-line PTEN mutations were detected in all of five families with both breast cancer and CD, in one family with juvenile polyposis syndrome, and in one of four families with breast and thyroid tumors. In this last case, signs of CD were subtle and were diagnosed only in the context of mutation analysis. PTEN mutations were not detected in 13 families at high risk of breast and/or ovarian cancer. No PTEN-coding-sequence polymorphisms were detected in 70 independent chromosomes. Seven PTEN germ-line mutations occurred, five nonsense and two missense mutations, in six of nine PTEN exons. The wild-type PTEN allele was lost from renal, uterine, breast, and thyroid tumors from a single patient. Loss of PTEN expression was an early event, reflected in loss of the wild-type allele in DNA from normal tissue adjacent to the breast and thyroid tumors. In RNA from normal tissues from three families, mutant transcripts appeared unstable. Germ-line PTEN mutations predispose to breast cancer in association with CD, although the signs of CD may be subtle.

    View details for Web of Science ID 000071555900007

    View details for PubMedID 9399897



    A collection of yeast strains bearing single marked Ty1 insertions on chromosome III was generated. Over 100 such insertions were physically mapped by pulsed-field gel electrophoresis. These insertions are very nonrandomly distributed. Thirty-two such insertions were cloned by the inverted PCR technique, and the flanking DNA sequences were determined. The sequenced insertions all fell within a few very limited regions of chromosome III. Most of these regions contained tRNA coding regions and/or LTRs of preexisting transposable elements. Open reading frames were disrupted at a far lower frequency than expected for random transposition. The results suggest that the Ty1 integration machinery can detect regions of the genome that may represent "safe havens" for insertion. These regions of the genome do not contain any special DNA sequences, nor do they behave as particularly good targets for Ty1 integration in vitro, suggesting that the targeted regions have special properties allowing specific recognition in vivo.

    View details for Web of Science ID A1993LF06100016

    View details for PubMedID 8388781