Clinical Focus


  • Cancer > GI Oncology
  • Medical Oncology
  • Oncology (Cancer)
  • Gastrointestinal Neoplasms
  • Inherited Cancer Disorders
  • Immunotherapy in gastrointestinal cancers

Academic Appointments


Administrative Appointments


  • Department of Medicine Team Science Division Representative, Department of Medicine, Stanford University (2022 - Present)
  • Senior Associate Director, Stanford Genome Technology Center (2008 - 2020)

Honors & Awards


  • Physician-Scientist Fellowship Award, Howard Hughes Medical Institute (1998)
  • American Association Cancer Research, Scholar-in-Training Award for Research Achievement (2005)
  • Merit Award for Research Achievement, American Society Clinical Oncology Foundation (2006)
  • Physician Scientist Early Career Award, Howard Hughes Medical Institute (2008)
  • Clinical Scientist Development Award, Doris Duke Charitable Foundation (2009)
  • Research Scholar Award, American Cancer Society (2013)

Professional Education


  • Board Certification: American Board of Internal Medicine, Medical Oncology (2004)
  • Residency: University of Iowa Hospitals and Clinics (1996) IA
  • Residency: University of Washington Medical Center Dept of Medicine (2001) WA
  • Medical Education: Johns Hopkins University School of Medicine (1994) MD
  • Fellowship: Stanford University Hospital -Clinical Excellence Research Center (2005) CA
  • B.A., Reed College, Biology
  • M.D., Johns Hopkins University, Medicine

Current Research and Scholarly Interests


Our research group integrates new molecular technology development, advanced computation methods and genome biology to identify targets for therapy in cancer. We are pursuing projects focused on developing new therapies for stomach, bile duct and colon cancer. We also are involved in study the basis of genomic instability by examining chromosome structure.

Ongoing projects include:

1) Immunogenomic approaches to study cancer's interaction with the immune system and improve our understanding of immunotherapy

2) Identification of kinase interactions which can improve targeted therapy strategies

3) Use of advanced genome sequencing technologies including nanopore sequencers to understand the role of cancer rearrangements in response to therapy

4) Identifying genes that increase the risk of developing cancer

5) Developing new approaches for monitoring cancer from circulating DNA

We are developing new technologies for data storage using DNA technologies.

Clinical Trials


  • Clinical & Pathological Studies of Upper Gastrointestinal Carcinoma Recruiting

    Our research of the biology of upper gastrointestinal cancers involves the study of tissue samples and cells from biopsies of persons with gastric or esophageal cancer or blood samples from upper gastrointestinal cancer patients and persons at high inherited risk for these cancers. We hope to learn the role genes and proteins play in the development of gastric and esophageal cancer.

    View full details

  • The Gastric Cancer Foundation: A Gastric Cancer Registry Recruiting

    The Gastric Cancer Registry will combine data acquired directly from patients with gastric cancer; with a family history of gastric cancer in a first or second degree relative; or persons with a known germline mutation in their CDH1 (E-Cadherin) gene via an online questionnaire with genomic data obtained from saliva, blood and tissue samples. The purpose of this registry is to gain better understanding of the causes of gastric cancer, both environmental and genetic; whether certain genomic data can predict outcomes of treatment and survival.

    View full details

2024-25 Courses


Stanford Advisees


Graduate and Fellowship Programs


All Publications


  • Resolving the 22q11.2 deletion using CTLR-Seq reveals chromosomal rearrangement mechanisms and individual variance in breakpoints. Proceedings of the National Academy of Sciences of the United States of America Zhou, B., Purmann, C., Guo, H., Shin, G., Huang, Y., Pattni, R., Meng, Q., Greer, S. U., Roychowdhury, T., Wood, R. N., Ho, M., Dohna, H. Z., Abyzov, A., Hallmayer, J. F., Wong, W. H., Ji, H. P., Urban, A. E. 2024; 121 (31): e2322834121

    Abstract

    We developed a generally applicable method, CRISPR/Cas9-targeted long-read sequencing (CTLR-Seq), to resolve, haplotype-specifically, the large and complex regions in the human genome that had been previously impenetrable to sequencing analysis, such as large segmental duplications (SegDups) and their associated genome rearrangements. CTLR-Seq combines in vitro Cas9-mediated cutting of the genome and pulse-field gel electrophoresis to isolate intact large (i.e., up to 2,000 kb) genomic regions that encompass previously unresolvable genomic sequences. These targets are then sequenced (amplification-free) at high on-target coverage using long-read sequencing, allowing for their complete sequence assembly. We applied CTLR-Seq to the SegDup-mediated rearrangements that constitute the boundaries of, and give rise to, the 22q11.2 Deletion Syndrome (22q11DS), the most common human microdeletion disorder. We then performed de novo assembly to resolve, at base-pair resolution, the full sequence rearrangements and exact chromosomal breakpoints of 22q11.2DS (including all common subtypes). Across multiple patients, we found a high degree of variability for both the rearranged SegDup sequences and the exact chromosomal breakpoint locations, which coincide with various transposons within the 22q11.2 SegDups, suggesting that 22q11DS can be driven by transposon-mediated genome recombination. Guided by CTLR-Seq results from two 22q11DS patients, we performed three-dimensional chromosomal folding analysis for the 22q11.2 SegDups from patient-derived neurons and astrocytes and found chromosome interactions anchored within the SegDups to be both cell type-specific and patient-specific. Lastly, we demonstrated that CTLR-Seq enables cell-type specific analysis of DNA methylation patterns within the deletion haplotype of 22q11DS.

    View details for DOI 10.1073/pnas.2322834121

    View details for PubMedID 39042694

  • Single cell transcriptomic analysis reveals differences between primary appendiceal tumors Ayala, C. I., Sathe, A., Bai, X., Grimes, S., Lee, B., Ji, H. P. SPRINGER. 2024: S230
  • Niche-DE: niche-differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell-cell interactions. Genome biology Mason, K., Sathe, A., Hess, P. R., Rong, J., Wu, C. Y., Furth, E., Susztak, K., Levinsohn, J., Ji, H. P., Zhang, N. 2024; 25 (1): 14

    Abstract

    Existing methods for analysis of spatial transcriptomic data focus on delineating the global gene expression variations of cell types across the tissue, rather than local gene expression changes driven by cell-cell interactions. We propose a new statistical procedure called niche-differential expression (niche-DE) analysis that identifies cell-type-specific niche-associated genes, which are differentially expressed within a specific cell type in the context of specific spatial niches. We further develop niche-LR, a method to reveal ligand-receptor signaling mechanisms that underlie niche-differential gene expression patterns. Niche-DE and niche-LR are applicable to low-resolution spot-based spatial transcriptomics data and data that is single-cell or subcellular in resolution.

    View details for DOI 10.1186/s13059-023-03159-6

    View details for PubMedID 38217002

    View details for PubMedCentralID 6765259

  • GITR and TIGIT immunotherapy provokes divergent multicellular responses in the tumor microenvironment of gastrointestinal cancers. Genome medicine Sathe, A., Ayala, C., Bai, X., Grimes, S. M., Lee, B., Kin, C., Shelton, A., Poultsides, G., Ji, H. P. 2023; 15 (1): 100

    Abstract

    Understanding the mechanistic effects of novel immunotherapy agents is critical to improving their successful clinical translation. These effects need to be studied in preclinical models that maintain the heterogenous tumor microenvironment (TME) and dysfunctional cell states found in a patient's tumor. We investigated immunotherapy perturbations targeting co-stimulatory molecule GITR and co-inhibitory immune checkpoint TIGIT in a patient-derived ex vivo system that maintains the TME in its near-native state. Leveraging single-cell genomics, we identified cell type-specific transcriptional reprogramming in response to immunotherapy perturbations.We generated ex vivo tumor slice cultures from fresh surgical resections of gastric and colon cancer and treated them with GITR agonist or TIGIT antagonist antibodies. We applied paired single-cell RNA and TCR sequencing to the original surgical resections, control, and treated ex vivo tumor slice cultures. We additionally confirmed target expression using multiplex immunofluorescence and validated our findings with RNA in situ hybridization.We confirmed that tumor slice cultures maintained the cell types, transcriptional cell states and proportions of the original surgical resection. The GITR agonist was limited to increasing effector gene expression only in cytotoxic CD8 T cells. Dysfunctional exhausted CD8 T cells did not respond to GITR agonist. In contrast, the TIGIT antagonist increased TCR signaling and activated both cytotoxic and dysfunctional CD8 T cells. This included cells corresponding to TCR clonotypes with features indicative of potential tumor antigen reactivity. The TIGIT antagonist also activated T follicular helper-like cells and dendritic cells, and reduced markers of immunosuppression in regulatory T cells.We identified novel cellular mechanisms of action of GITR and TIGIT immunotherapy in the patients' TME. Unlike the GITR agonist that generated a limited transcriptional response, TIGIT antagonist orchestrated a multicellular response involving CD8 T cells, T follicular helper-like cells, dendritic cells, and regulatory T cells. Our experimental strategy combining single-cell genomics with preclinical models can successfully identify mechanisms of action of novel immunotherapy agents. Understanding the cellular and transcriptional mechanisms of response or resistance will aid in prioritization of targets and their clinical translation.

    View details for DOI 10.1186/s13073-023-01259-3

    View details for PubMedID 38008725

    View details for PubMedCentralID PMC10680277

  • A clinical trial of therapeutic vaccination in lymphoma with serial tumor sampling and single cell analysis. Blood advances Shree, T., Haebe, S. E., Czerwinski, D. K., Eckhert, E., Day, G., Sathe, A., Grimes, S. M., Frank, M. J., Maeda, L., Alizadeh, A. A., Advani, R. H., Hoppe, R. T., Long, S. R., Martin, B. A., Ozawa, M. G., Khodadoust, M. S., Ji, H. P., Levy, R. 2023

    Abstract

    In situ vaccination (ISV) triggers an immune response to tumor-associated antigens at one tumor site that can then tackle disease throughout the body. Here we report clinical and biological results of a phase I/II ISV trial in patients with low-grade lymphoma (NCT02927964) combining an intratumoral TLR9 agonist with local low-dose radiation, and ibrutinib (an inhibitor of B and T cell kinases). Adverse events were predominately low grade. The overall response rate was 50%, including one complete response. All patients experienced tumor reduction at distant sites. Single cell analyses of serial fine needle aspirates from injected and uninjected tumors revealed correlates of clinical response, such as lower CD47 and higher MHCII expression on tumor cells, enhanced T and NK cell effector function, and reduced immune suppression from TGFß and inhibitory T regulatory 1 cells. While changes at the local injected site were more pronounced, changes at distant uninjected sites more often associated with clinical responses. Functional immune response assays and tracking of T cell receptor sequences provided evidence of treatment-induced tumor-specific T cell responses. Induction of immune effectors and reversal of negative regulators were both important in producing clinically meaningful tumor responses. NCT02927964.

    View details for DOI 10.1182/bloodadvances.2023011589

    View details for PubMedID 37939259

  • Co-Occurrence of Clonally Related Follicular Lymphoma and Histiocytic Sarcoma Haebe, S., Czerwinski, D. K., Sathe, A., Grimes, S., Chen, T., Martin, B., Ji, H., Levy, R., Shree, T. AMER SOC HEMATOLOGY. 2023
  • A spatially mapped gene expression signature for intestinal stem-like cells identifies high-risk precursors of gastric cancer. bioRxiv : the preprint server for biology Huang, R. J., Wichmann, I. A., Su, A., Sathe, A., Shum, M. V., Grimes, S. M., Meka, R., Almeda, A., Bai, X., Shen, J., Nguyen, Q., Amieva, M. R., Hwang, J. H., Ji, H. P. 2023

    Abstract

    Gastric intestinal metaplasia (GIM) is a precancerous lesion that increases gastric cancer (GC) risk. The Operative Link on GIM (OLGIM) is a combined clinical-histopathologic system to risk-stratify patients with GIM. The identification of molecular biomarkers that are indicators for advanced OLGIM lesions may improve cancer prevention efforts.This study was based on clinical and genomic data from four cohorts: 1) GAPS, a GIM cohort with detailed OLGIM severity scoring (N=303 samples); 2) the Cancer Genome Atlas (N=198); 3) a collation of in-house and publicly available scRNA-seq data (N=40), and 4) a spatial validation cohort (N=5) consisting of annotated histology slides of patients with either GC or advanced GIM. We used a multi-omics pipeline to identify, validate and sequentially parse a highly-refined signature of 26 genes which characterize high-risk GIM.Using standard RNA-seq, we analyzed two separate, non-overlapping discovery (N=88) and validation (N=215) sets of GIM. In the discovery phase, we identified 105 upregulated genes specific for high-risk GIM (defined as OLGIM III-IV), of which 100 genes were independently confirmed in the validation set. Spatial transcriptomic profiling revealed 36 of these 100 genes to be expressed in metaplastic foci in GIM. Comparison with bulk GC sequencing data revealed 26 of these genes to be expressed in intestinal-type GC. Single-cell profiling resolved the 26-gene signature to both mature intestinal lineages (goblet cells, enterocytes) and immature intestinal lineages (stem-like cells). A subset of these genes was further validated using single-molecule multiplex fluorescence in situ hybridization. We found certain genes (TFF3 and ANPEP) to mark differentiated intestinal lineages, whereas others (OLFM4 and CPS1) localized to immature cells in the isthmic/crypt region of metaplastic glands, consistent with the findings from scRNAseq analysis.using an integrated multi-omics approach, we identified a novel 26-gene expression signature for high-OLGIM precursors at increased risk for GC. We found this signature localizes to aberrant intestinal stem-like cells within the metaplastic microenvironment. These findings hold important translational significance for future prevention and early detection efforts.

    View details for DOI 10.1101/2023.09.20.558462

    View details for PubMedID 37786704

    View details for PubMedCentralID PMC10541579

  • Direct measurement of engineered cancer mutations and their transcriptional phenotypes in single cells. Nature biotechnology Kim, H. S., Grimes, S. M., Chen, T., Sathe, A., Lau, B. T., Hwang, G. H., Bae, S., Ji, H. P. 2023

    Abstract

    Genome sequencing studies have identified numerous cancer mutations across a wide spectrum of tumor types, but determining the phenotypic consequence of these mutations remains a challenge. Here, we developed a high-throughput, multiplexed single-cell technology called TISCC-seq to engineer predesignated mutations in cells using CRISPR base editors, directly delineate their genotype among individual cells and determine each mutation's transcriptional phenotype. Long-read sequencing of the target gene's transcript identifies the engineered mutations, and the transcriptome profile from the same set of cells is simultaneously analyzed by short-read sequencing. Through integration, we determine the mutations' genotype and expression phenotype at single-cell resolution. Using cell lines, we engineer and evaluate the impact of >100 TP53 mutations on gene expression. Based on the single-cell gene expression, we classify the mutations as having a functionally significant phenotype.

    View details for DOI 10.1038/s41587-023-01949-8

    View details for PubMedID 37697151

    View details for PubMedCentralID 8018281

  • Follicular lymphoma evolves with a surmountable dependency on acquired glycosylation motifs in the B cell receptor. Blood Haebe, S. E., Day, G., Czerwinski, D. K., Sathe, A., Grimes, S. M., Chen, T., Long, S. R., Martin, B. A., Ozawa, M. G., Ji, H. P., Shree, T., Levy, R. 2023

    Abstract

    An early event in the genesis of follicular lymphoma (FL) is the acquisition of new glycosylation motifs in the B cell receptor (BCR) due to gene rearrangement and/or somatic hypermutation. These N-linked glycosylation motifs (N-motifs) contain mannose-terminated glycans and can interact with lectins in the tumor microenvironment, activating the tumor BCR pathway. N-motifs are stable during FL evolution suggesting that FL tumor cells are dependent on them for their survival. Here, we investigated the dynamics and potential impact of N-motif prevalence in FL at the single cell level across distinct tumor sites and over time in 17 patients. While most patients had acquired at least one N-motif as an early event, we also found (i) cases without N-motifs in the heavy or light chains at any tumor site or timepoint and (ii) cases with discordant N-motif patterns across different tumor sites. Inferring phylogenetic trees for the patients with discordant patterns, we observed that both N-motif-positive and N-motif-negative tumor subclones could be selected and expanded during tumor evolution. Comparing N-motif-positive to N-motif-negative tumor cells within a patient revealed higher expression of genes involved in the BCR pathway and inflammatory response, while tumor cells without N-motifs had higher activity of pathways involved in energy metabolism. In conclusion, while acquired N-motifs likely support FL pathogenesis through antigen-independent BCR signaling in most FL patients, N-motif-negative tumor cells can also be selected and expanded and may depend more heavily on altered metabolism for competitive survival.

    View details for DOI 10.1182/blood.2023020360

    View details for PubMedID 37683139

  • Single-cell multi-gene identification of somatic mutations and gene rearrangements in cancer. NAR cancer Grimes, S. M., Kim, H. S., Roy, S., Sathe, A., Ayala, C. I., Bai, X., Almeda-Notestine, A. F., Haebe, S., Shree, T., Levy, R., Lau, B. T., Ji, H. P. 2023; 5 (3): zcad034

    Abstract

    In this proof-of-concept study, we developed a single-cell method that provides genotypes of somatic alterations found in coding regions of messenger RNAs and integrates these transcript-based variants with their matching cell transcriptomes. We used nanopore adaptive sampling on single-cell complementary DNA libraries to validate coding variants in target gene transcripts, and short-read sequencing to characterize cell types harboring the mutations. CRISPR edits for 16 targets were identified using a cancer cell line, and known variants in the cell line were validated using a 352-gene panel. Variants in primary cancer samples were validated using target gene panels ranging from 161 to 529 genes. A gene rearrangement was also identified in one patient, with the rearrangement occurring in two distinct tumor sites.

    View details for DOI 10.1093/narcan/zcad034

    View details for PubMedID 37435532

    View details for PubMedCentralID PMC10331933

  • Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome. Cell reports methods Lee, H., Greer, S. U., Pavlichin, D. S., Zhou, B., Urban, A. E., Weissman, T., Ji, H. P. 2023; 3 (8): 100543

    Abstract

    The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as "pan-conserved segment tags" (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.

    View details for DOI 10.1016/j.crmeth.2023.100543

    View details for PubMedID 37671027

    View details for PubMedCentralID PMC10475782

  • Transitioning single-cell genomics into the clinic. Nature reviews. Genetics Lim, J., Chin, V., Fairfax, K., Moutinho, C., Suan, D., Ji, H., Powell, J. E. 2023

    Abstract

    The use of genomics is firmly established in clinical practice, resulting in innovations across a wide range of disciplines such as genetic screening, rare disease diagnosis and molecularly guided therapy choice. This new field of genomic medicine has led to improvements in patient outcomes. However, most clinical applications of genomics rely on information generated from bulk approaches, which do not directly capture the genomic variation that underlies cellular heterogeneity. With the advent of single-cell technologies, research is rapidly uncovering how genomic data at cellular resolution can be used to understand disease pathology and mechanisms. Both DNA-based and RNA-based single-cell technologies have the potential to improve existing clinical applications and open new application spaces for genomics in clinical practice, with oncology, immunology and haematology poised for initial adoption. However, challenges in translating cellular genomics from research to a clinical setting must first be overcome.

    View details for DOI 10.1038/s41576-023-00613-w

    View details for PubMedID 37258725

    View details for PubMedCentralID 5835770

  • Magnetic DNA random access memory with nanopore readouts and exponentially-scaled combinatorial addressing. Scientific reports Lau, B., Chandak, S., Roy, S., Tatwawadi, K., Wootters, M., Weissman, T., Ji, H. P. 2023; 13 (1): 8514

    Abstract

    The storage of data in DNA typically involves encoding and synthesizing data into short oligonucleotides, followed by reading with a sequencing instrument. Major challenges include the molecular consumption of synthesized DNA, basecalling errors, and limitations with scaling up read operations for individual data elements. Addressing these challenges, we describe a DNA storage system called MDRAM (Magnetic DNA-based Random Access Memory) that enables repetitive and efficient readouts of targeted files with nanopore-based sequencing. By conjugating synthesized DNA to magnetic agarose beads, we enabled repeated data readouts while preserving the original DNA analyte and maintaining data readout quality. MDRAM utilizes an efficient convolutional coding scheme that leverages soft information in raw nanopore sequencing signals to achieve information reading costs comparable to Illumina sequencing despite higher error rates. Finally, we demonstrate a proof-of-concept DNA-based proto-filesystem that enables an exponentially-scalable data address space using only small numbers of targeting primers for assembly and readout.

    View details for DOI 10.1038/s41598-023-29575-z

    View details for PubMedID 37231057

  • Short Tandem Repeat DNA Profiling Using Perylene-Oligonucleotide Fluorescence Assay. Analytical chemistry Hernandez Bustos, A., Martiny, E., Bom Pedersen, N., Parvathaneni, R. P., Hansen, J., Ji, H. P., Astakhova, K. 2023

    Abstract

    We report an amplification-free genotyping method to determine the number of human short tandem repeats (STRs). DNA-based STR profiling is a robust method for genetic identification purposes such as forensics and biobanking and for identifying specific molecular subtypes of cancer. STR detection requires polymerase amplification, which introduces errors that obscure the correct genotype. We developed a new method that requires no polymerase. First, we synthesized perylene-nucleoside reagents and incorporated them into oligonucleotide probes that recognize five common human STRs. Using these probes and a bead-based hybridization approach, accurate STR detection was achieved in only 1.5 h, including DNA preparation steps, with up to a 1000-fold target DNA enrichment. This method was comparable to PCR-based assays. Using standard fluorometry, the limit of detection was 2.00 ± 0.07 pM for a given target. We used this assay to accurately identify STRs from 50 human subjects, achieving >98% consensus with sequencing data for STR genotyping.

    View details for DOI 10.1021/acs.analchem.3c00063

    View details for PubMedID 37183373

  • Pangenome graph construction from genome alignments with Minigraph-Cactus NATURE BIOTECHNOLOGY Hickey, G., Monlong, J., Ebler, J., Novak, A. M., Eizenga, J. M., Gao, Y., Marschall, T., Li, H., Paten, B., Abel, H. J., Antonacci-Fulton, L. L., Asri, M., Baid, G., Baker, C. A., Belyaeva, A., Billis, K., Bourque, G., Buonaiuto, S., Carroll, A., Chaisson, M. P., Chang, P., Chang, X. H., Cheng, H., Chu, J., Cody, S., Colonna, V., Cook, D. E., Cook-Deegan, R. M., Cornejo, O. E., Diekhans, M., Doerr, D., Ebert, P., Ebler, J., Eichler, E. E., Eizenga, J. M., Fairley, S., Fedrigo, O., Felsenfeld, A. L., Feng, X., Fischer, C., Flicek, P., Formenti, G., Frankish, A., Fulton, R. S., Gao, Y., Garg, S., Garrison, E., Garrison, N. A., Giron, C., Green, R. E., Groza, C., Guarracino, A., Haggerty, L., Hall, I. M., Harvey, W. T., Haukness, M., Haussler, D., Heumos, S., Hickey, G., Hoekzema, K., Hourlier, T., Howe, K., Jain, M., Jarvis, E. D., Ji, H. P., Kenny, E. E., Koenig, B. A., Kolesnikov, A., Korbel, J. O., Kordosky, J., Koren, S., Lee, H., Lewis, A. P., Liao, W., Lu, S., Lu, T., Lucas, J. K., Hugo, M., Santiago, M., Marijon, P., Markello, C., Marschall, T., Martin, F. J., McCartney, A., McDaniel, J., Miga, K. H., Mitchell, M. W., Monlong, J., Mountcastle, J., Munson, K. M., Mwaniki, M., Nattestad, M., Novak, A. M., Nurk, S., Olsen, H. E., Olson, N. D., Pesout, T., Phillippy, A. M., Popejoy, A. B., Porubsky, D., Prins, P., Puiu, D., Rautiainen, M., Regier, A. A., Rhie, A., Sacco, S., Sanders, A. D., Schneider, V. A., Schultz, B., Shafin, K., Sibbesen, J. A., Siren, J., Smith, M. W., Sofia, H. J., Abou Tayoun, A. N., Thibaud-Nissen, F., Tomlinson, C., Tricomi, F., Villani, F., Vollger, M. R., Wagner, J., Walenz, B., Wang, T., Wood, J. D., Zimin, A., Zook, J. M., Human Pangenome Reference 2023

    Abstract

    Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.

    View details for DOI 10.1038/s41587-023-01793-w

    View details for Web of Science ID 000992565300001

    View details for PubMedID 37165083

    View details for PubMedCentralID 8006571

  • Single-molecule methylation profiles of cell-free DNA in cancer with nanopore sequencing. Genome medicine Lau, B. T., Almeda, A., Schauer, M., McNamara, M., Bai, X., Meng, Q., Partha, M., Grimes, S. M., Lee, H., Heestand, G. M., Ji, H. P. 2023; 15 (1): 33

    Abstract

    Epigenetic characterization of cell-free DNA (cfDNA) is an emerging approach for detecting and characterizing diseases such as cancer. We developed a strategy using nanopore-based single-molecule sequencing to measure cfDNA methylomes. This approach generated up to 200 million reads for a single cfDNA sample from cancer patients, an order of magnitude improvement over existing nanopore sequencing methods. We developed a single-molecule classifier to determine whether individual reads originated from a tumor or immune cells. Leveraging methylomes of matched tumors and immune cells, we characterized cfDNA methylomes of cancer patients for longitudinal monitoring during treatment.

    View details for DOI 10.1186/s13073-023-01178-3

    View details for PubMedID 37138315

    View details for PubMedCentralID 1283450

  • A draft human pangenome reference. Nature Liao, W. W., Asri, M., Ebler, J., Doerr, D., Haukness, M., Hickey, G., Lu, S., Lucas, J. K., Monlong, J., Abel, H. J., Buonaiuto, S., Chang, X. H., Cheng, H., Chu, J., Colonna, V., Eizenga, J. M., Feng, X., Fischer, C., Fulton, R. S., Garg, S., Groza, C., Guarracino, A., Harvey, W. T., Heumos, S., Howe, K., Jain, M., Lu, T. Y., Markello, C., Martin, F. J., Mitchell, M. W., Munson, K. M., Mwaniki, M. N., Novak, A. M., Olsen, H. E., Pesout, T., Porubsky, D., Prins, P., Sibbesen, J. A., Sirén, J., Tomlinson, C., Villani, F., Vollger, M. R., Antonacci-Fulton, L. L., Baid, G., Baker, C. A., Belyaeva, A., Billis, K., Carroll, A., Chang, P. C., Cody, S., Cook, D. E., Cook-Deegan, R. M., Cornejo, O. E., Diekhans, M., Ebert, P., Fairley, S., Fedrigo, O., Felsenfeld, A. L., Formenti, G., Frankish, A., Gao, Y., Garrison, N. A., Giron, C. G., Green, R. E., Haggerty, L., Hoekzema, K., Hourlier, T., Ji, H. P., Kenny, E. E., Koenig, B. A., Kolesnikov, A., Korbel, J. O., Kordosky, J., Koren, S., Lee, H., Lewis, A. P., Magalhães, H., Marco-Sola, S., Marijon, P., McCartney, A., McDaniel, J., Mountcastle, J., Nattestad, M., Nurk, S., Olson, N. D., Popejoy, A. B., Puiu, D., Rautiainen, M., Regier, A. A., Rhie, A., Sacco, S., Sanders, A. D., Schneider, V. A., Schultz, B. I., Shafin, K., Smith, M. W., Sofia, H. J., Abou Tayoun, A. N., Thibaud-Nissen, F., Tricomi, F. F., Wagner, J., Walenz, B., Wood, J. M., Zimin, A. V., Bourque, G., Chaisson, M. J., Flicek, P., Phillippy, A. M., Zook, J. M., Eichler, E. E., Haussler, D., Wang, T., Jarvis, E. D., Miga, K. H., Garrison, E., Marschall, T., Hall, I. M., Li, H., Paten, B. 2023; 617 (7960): 312-324

    Abstract

    Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.

    View details for DOI 10.1038/s41586-023-05896-x

    View details for PubMedID 37165242

    View details for PubMedCentralID PMC10172123

  • Single cell and spatial alternative splicing analysis with long read sequencing. Research square Fu, Y., Kim, H., Adams, J. I., Grimes, S. M., Huang, S., Lau, B. T., Sathe, A., Hess, P., Ji, H. P., Zhang, N. R. 2023

    Abstract

    Long-read sequencing has become a powerful tool for alternative splicing analysis. However, technical and computational challenges have limited our ability to explore alternative splicing at single cell and spatial resolution. The higher sequencing error of long reads, especially high indel rates, have limited the accuracy of cell barcode and unique molecular identifier (UMI) recovery. Read truncation and mapping errors, the latter exacerbated by the higher sequencing error rates, can cause the false detection of spurious new isoforms. Downstream, there is yet no rigorous statistical framework to quantify splicing variation within and between cells/spots. In light of these challenges, we developed Longcell, a statistical framework and computational pipeline for accurate isoform quantification for single cell and spatial spot barcoded long read sequencing data. Longcell performs computationally efficient cell/spot barcode extraction, UMI recovery, and UMI-based truncation- and mapping-error correction. Through a statistical model that accounts for varying read coverage across cells/spots, Longcell rigorously quantifies the level of inter-cell/spot versus intra-cell/ spot diversity in exon-usage and detects changes in splicing distributions between cell populations. Applying Longcell to single cell long-read data from multiple contexts, we found that intra-cell splicing heterogeneity, where multiple isoforms co-exist within the same cell, is ubiquitous for highly expressed genes. On matched single cell and Visium long read sequencing for a tissue of colorectal cancer metastasis to the liver, Longcell found concordant signals between the two data modalities. Finally, on a perturbation experiment for 9 splicing factors, Longcell identified regulatory targets that are validated by targeted sequencing.

    View details for DOI 10.21203/rs.3.rs-2674892/v1

    View details for PubMedID 36993612

    View details for PubMedCentralID PMC10055662

  • GITR and TIGIT immunotherapy provokes divergent multi-cellular responses in the tumor microenvironment of gastrointestinal cancers. bioRxiv : the preprint server for biology Sathe, A., Ayala, C., Bai, X., Grimes, S. M., Lee, B., Kin, C., Shelton, A., Poultsides, G., Ji, H. P. 2023

    Abstract

    Understanding the cellular mechanisms of novel immunotherapy agents in the human tumor microenvironment (TME) is critical to their clinical success. We examined GITR and TIGIT immunotherapy in gastric and colon cancer patients using ex vivo slice tumor slice cultures derived from cancer surgical resections. This primary culture system maintains the original TME in a near-native state. We applied paired single-cell RNA and TCR sequencing to identify cell type specific transcriptional reprogramming. The GITR agonist was limited to increasing effector gene expression only in cytotoxic CD8 T cells. The TIGIT antagonist increased TCR signaling and activated both cytotoxic and dysfunctional CD8 T cells, including clonotypes indicative of potential tumor antigen reactivity. The TIGIT antagonist also activated T follicular helper-like cells and dendritic cells, and reduced markers of immunosuppression in regulatory T cells. Overall, we identified cellular mechanisms of action of these two immunotherapy targets in the patients' TME.

    View details for DOI 10.1101/2023.03.13.532299

    View details for PubMedID 36993756

    View details for PubMedCentralID PMC10054933

  • Single Cell Transcriptomic Analysis of Human Extra- and Intra-Hepatic Cholangiocarcinoma Ayala, C. I., Sathe, A., Grimes, S., Bae, X., Dua, M., Poultsides, G., Visser, B., Ji, H. SPRINGER. 2023: S177-S178
  • The Gastric Cancer Registry Genome Explorer: A tool for genomic discovery. Almeda, A., Grimes, S. M., Shin, G., Lee, H., Wichmann, I., Greer, S., Ji, H. P. LIPPINCOTT WILLIAMS & WILKINS. 2023: 434
  • Tumor-associated microbiome features of metastatic colorectal cancer and clinical implications. Frontiers in oncology An, H. J., Partha, M. A., Lee, H., Lau, B. T., Pavlichin, D. S., Almeda, A., Hooker, A. C., Shin, G., Ji, H. P. 2023; 13: 1310054

    Abstract

    Colon microbiome composition contributes to the pathogenesis of colorectal cancer (CRC) and prognosis. We analyzed 16S rRNA sequencing data from tumor samples of patients with metastatic CRC and determined the clinical implications.We enrolled 133 patients with metastatic CRC at St. Vincent Hospital in Korea. The V3-V4 regions of the 16S rRNA gene from the tumor DNA were amplified, sequenced on an Illumina MiSeq, and analyzed using the DADA2 package.After excluding samples that retained <5% of the total reads after merging, 120 samples were analyzed. The median age of patients was 63 years (range, 34-82 years), and 76 patients (63.3%) were male. The primary cancer sites were the right colon (27.5%), left colon (30.8%), and rectum (41.7%). All subjects received 5-fluouracil-based systemic chemotherapy. After removing genera with <1% of the total reads in each patient, 523 genera were identified. Rectal origin, high CEA level (≥10 ng/mL), and presence of lung metastasis showed higher richness. Survival analysis revealed that the presence of Prevotella (p = 0.052), Fusobacterium (p = 0.002), Selenomonas (p<0.001), Fretibacterium (p = 0.001), Porphyromonas (p = 0.007), Peptostreptococcus (p = 0.002), and Leptotrichia (p = 0.003) were associated with short overall survival (OS, <24 months), while the presence of Sphingomonas was associated with long OS (p = 0.070). From the multivariate analysis, the presence of Selenomonas (hazard ratio [HR], 6.35; 95% confidence interval [CI], 2.38-16.97; p<0.001) was associated with poor prognosis along with high CEA level.Tumor microbiome features may be useful prognostic biomarkers for metastatic CRC.

    View details for DOI 10.3389/fonc.2023.1310054

    View details for PubMedID 38304032

    View details for PubMedCentralID PMC10833227

  • Large Cancer Pedigree Involving Multiple Cancer Genes including Likely Digenic MSH2 and MSH6 Lynch Syndrome (LS) and an Instance of Recombinational Rescue from LS. Cancers Vogelaar, I. P., Greer, S., Wang, F., Shin, G., Lau, B., Hu, Y., Haraldsdottir, S., Alvarez, R., Hazelett, D., Nguyen, P., Aguirre, F. P., Guindi, M., Hendifar, A., Balcom, J., Leininger, A., Fairbank, B., Ji, H., Hitchins, M. P. 2022; 15 (1)

    Abstract

    Lynch syndrome (LS), caused by heterozygous pathogenic variants affecting one of the mismatch repair (MMR) genes (MSH2, MLH1, MSH6, PMS2), confers moderate to high risks for colorectal, endometrial, and other cancers. We describe a four-generation, 13-branched pedigree in which multiple LS branches carry the MSH2 pathogenic variant c.2006G>T (p.Gly669Val), one branch has this and an additional novel MSH6 variant c.3936_4001+8dup (intronic), and other non-LS branches carry variants within other cancer-relevant genes (NBN, MC1R, PTPRJ). Both MSH2 c.2006G>T and MSH6 c.3936_4001+8dup caused aberrant RNA splicing in carriers, including out-of-frame exon-skipping, providing functional evidence of their pathogenicity. MSH2 and MSH6 are co-located on Chr2p21, but the two variants segregated independently (mapped in trans) within the digenic branch, with carriers of either or both variants. Thus, MSH2 c.2006G>T and MSH6 c.3936_4001+8dup independently confer LS with differing cancer risks among family members in the same branch. Carriers of both variants have near 100% risk of transmitting either one to offspring. Nevertheless, a female carrier of both variants did not transmit either to one son, due to a germline recombination within the intervening region. Genetic diagnosis, risk stratification, and counseling for cancer and inheritance were highly individualized in this family. The finding of multiple cancer-associated variants in this pedigree illustrates a need to consider offering multicancer gene panel testing, as opposed to targeted cascade testing, as additional cancer variants may be uncovered in relatives.

    View details for DOI 10.3390/cancers15010228

    View details for PubMedID 36612224

  • Activating Immune Effectors and Dampening Immune Suppressors Generates Successful Therapeutic Cancer Vaccination in Patients with Lymphoma Shree, T., Haebe, S., Czerwinski, D. K., Eckhert, E., Day, G., Sathe, A., Grimes, S. M., Frank, M. J., Maeda, L. S., Alizadeh, A. A., Advani, R. H., Hoppe, R., Long, S. R., Martin, B., Ozawa, M. G., Khodadoust, M. S., Ji, H. P., Levy, R. AMER SOC HEMATOLOGY. 2022: 6450-6451
  • Prevalence of Acquired N-Glycosylation Sites at the Single Cell Level in Follicular Lymphoma Haebe, S., Shree, T., Day, G., Czerwinski, D. K., Sathe, A., Grimes, S. M., Long, S. R., Martin, B., Ozawa, M. G., Ji, H. P., Levy, R. AMER SOC HEMATOLOGY. 2022: 9211-9212
  • Colorectal cancer metastases in the liver establish immunosuppressive spatial networking between tumor associated SPP1+ macrophages and fibroblasts. Clinical cancer research : an official journal of the American Association for Cancer Research Sathe, A., Mason, K., Grimes, S. M., Zhou, Z., Lau, B. T., Bai, X., Su, A., Tan, X., Lee, H., Suarez, C. J., Nguyen, Q., Poultsides, G., Zhang, N. R., Ji, H. P. 2022

    Abstract

    The liver is the most frequent metastatic site for colorectal cancer (CRC). Its microenvironment is modified to provide a niche that is conducive for CRC cell growth.This study focused on characterizing the cellular changes in the metastatic CRC (mCRC) liver tumor microenvironment (TME).We analyzed a series of microsatellite stable (MSS) mCRCs to the liver, paired normal liver tissue and peripheral blood mononuclear cells using single cell RNA-seq (scRNA-seq). We validated our findings using multiplexed spatial imaging and bulk gene expression with cell deconvolution.We identified TME-specific SPP1-expressing macrophages with altered metabolism features, foam cell characteristics and increased activity in extracellular matrix (ECM) organization. SPP1+ macrophages and fibroblasts expressed complementary ligand receptor pairs with the potential to mutually influence their gene expression programs. TME lacked dysfunctional CD8 T cells and contained regulatory T cells, indicative of immunosuppression. Spatial imaging validated these cell states in the TME. Moreover, TME macrophages and fibroblasts had close spatial proximity, which is a requirement for intercellular communication and networking.In an independent cohort of mCRCs in the liver, we confirmed the presence of SPP1+ macrophages and fibroblasts using gene expression data. An increased proportion of TME fibroblasts was associated with a worst prognosis in these patients.We demonstrated that mCRC in the liver is characterized by transcriptional alterations of macrophages in the TME. Intercellular networking between macrophages and fibroblasts supports CRC growth in the immunosuppressed metastatic niche in the liver. These features can be used to target immune checkpoint resistant MSS tumors.

    View details for DOI 10.1158/1078-0432.CCR-22-2041

    View details for PubMedID 36239989

  • RESOLVING THE EXACT BREAKPOINTS AND SEQUENCE REARRANGEMENTS OF LARGE NEUROPSYCHIATRIC COPY NUMBER VARIATIONS (CNVS) AT SINGLE BASE-PAIR RESOLUTION USING CRISPR-TARGETED ULTRALONG READ SEQUENCING (CTLR-SEQ) Zhou, B., Shin, G., Vervoort, L., Greer, S., Huang, Y., Roychowdhury, T., Pattni, R., Abyzov, A., Vermeesch, J., Ji, H., Urban, A. ELSEVIER. 2022: E88-E89
  • Predictive Model to Guide Brain Magnetic Resonance Imaging Surveillance in Patients With Metastatic Lung Cancer: Impact on Real-World Outcomes. JCO precision oncology Wu, J., Ding, V., Luo, S., Choi, E., Hellyer, J., Myall, N., Henry, S., Wood, D., Stehr, H., Ji, H., Nagpal, S., Hayden Gephart, M., Wakelee, H., Neal, J., Han, S. S. 2022; 6: e2200220

    Abstract

    Brain metastasis is common in lung cancer, and treatment of brain metastasis can lead to significant morbidity. Although early detection of brain metastasis may improve outcomes, there are no prediction models to identify high-risk patients for brain magnetic resonance imaging (MRI) surveillance. Our goal is to develop a machine learning-based clinicogenomic prediction model to estimate patient-level brain metastasis risk.A penalized regression competing risk model was developed using 330 patients diagnosed with lung cancer between January 2014 and June 2019 and followed through June 2021 at Stanford HealthCare. The main outcome was time from the diagnosis of distant metastatic disease to the development of brain metastasis, death, or censoring.Among the 330 patients, 84 (25%) developed brain metastasis over 627 person-years, with a 1-year cumulative brain metastasis incidence of 10.2% (95% CI, 6.8 to 13.6). Features selected for model inclusion were histology, cancer stage, age at diagnosis, primary site, and RB1 and ALK alterations. The prediction model yielded high discrimination (area under the curve 0.75). When the cohort was stratified by risk using a 1-year risk threshold of > 14.2% (85th percentile), the high-risk group had increased 1-year cumulative incidence of brain metastasis versus the low-risk group (30.8% v 6.1%, P < .01). Of 48 high-risk patients, 24 developed brain metastasis, and of these, 12 patients had brain metastasis detected more than 7 months after last brain MRI. Patients who missed this 7-month window had larger brain metastases (58% v 33% largest diameter > 10 mm; odds ratio, 2.80, CI, 0.51 to 13) versus those who had MRIs more frequently.The proposed model can identify high-risk patients, who may benefit from more intensive brain MRI surveillance to reduce morbidity of subsequent treatment through early detection.

    View details for DOI 10.1200/PO.22.00220

    View details for PubMedID 36201713

  • Exploratory genomic analysis of high grade neuroendocrine neoplasms across diverse primary sites. Endocrine-related cancer Sun, T. Y., Zhao, L., Van Hummelen, P., Martin, B., Hornbacker, K., Lee, H., Xia, L. C., Padda, S. K., Ji, H. P., Kunz, P. 2022

    Abstract

    High grade (grade 3) neuroendocrine neoplasms (G3 NENs) have poor survival outcomes. From a clinical standpoint, G3 NENs are usually grouped regardless of primary site and treated similarly. Little is known regarding the underlying genomics of these rare tumors, especially when compared across different primary sites. We performed whole transcriptome (n = 46), whole exome (n = 40) and gene copy number (n = 43) sequencing on G3 NEN FFPE samples from diverse organs (in total 17 were lung, 16 were gastroenteropancreatic, 13 other). G3 NENs despite arising from diverse primary sites did not have gene expression profiles that were easily segregated by organ of origin. Across all G3 NENs, TP53, APC, RB1 and CDKN2A were significantly mutated. The CDK4/6 cell cycling pathway was mutated in 95% of cases, with upregulation of oncogenes within this pathway. G3 NENs had high tumor mutation burden (mean 7.09 mutations/MB), with 20% having >10 mutations/MB. Two somatic copy number alterations were significantly associated with worse prognosis across tissue types: focal deletion 22q13.31 (HR, 7.82; p = 0.034) and arm amplification 19q (HR, 4.82; p = 0.032). This study is among the most diverse genomic study of high-grade neuroendocrine neoplasms. We uncovered genomic features previously unrecognized for this rapidly fatal and rare cancer type that could have potential prognostic and therapeutic implications.

    View details for DOI 10.1530/ERC-22-0015

    View details for PubMedID 36165930

  • The Gastric Cancer Registry: A Genomic Translational Resource for Multidisciplinary Research in Gastric Cancer. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology Almeda, A. F., Grimes, S. M., Lee, H., Greer, S., Shin, G., McNamara, M., Hooker, A. C., Arce, M. M., Kubit, M., Schauer, M. C., Van Hummelen, P., Ma, C., Mills, M. A., Huang, R. J., Hwang, J. H., Amieva, M. R., Han, S. S., Ford, J. M., Ji, H. P. 2022

    Abstract

    Gastric cancer (GC) is a leading cause of cancer morbidity and mortality. Developing information systems which integrate clinical and genomic data may accelerate discoveries to improve cancer prevention, detection, and treatment. To support translational research in GC, we developed the GC Registry (GCR), a North American repository of clinical and cancer genomics data.Participants self-enrolled online. Entry criteria into the GCR included the following: (1) diagnosis of GC, (2) history of GC in a first- or second-degree relative, or (3) known germline mutation in the gene CDH1. Participants provided demographic and clinical information through a detailed survey. Some participants provided specimens of saliva and tumor samples. Tumor samples underwent exome sequencing, whole genome sequencing and transcriptome sequencing.From 2011-2021, 567 individuals registered and returned the clinical questionnaire. For this cohort 65% had a personal history of GC, 36% reported a family history of GC and 14% had a germline CDH1 mutation. 89 GC patients provided tumor samples. For the initial study, 41 tumors were sequenced using next generation sequencing. The data was analyzed for cancer mutations, copy number variations, gene expression, microbiome, neoantigens, immune infiltrates, and other features. We developed a searchable, web-based interface (the GCR Genome Explorer) to enable researchers access to these datasets.The GCR is a unique, North American GC registry which integrates clinical and genomic annotation.Available for researchers through an open access, web-based explorer, the GCR Genome Explorer will accelerate collaborative GC research across the United States and world.

    View details for DOI 10.1158/1055-9965.EPI-22-0308

    View details for PubMedID 35771165

  • Germline variants of ATG7 in familial cholangiocarcinoma alter autophagy and p62. Scientific reports Greer, S. U., Chen, J., Ogmundsdottir, M. H., Ayala, C., Lau, B. T., Delacruz, R. G., Sandoval, I. T., Kristjansdottir, S., Jones, D. A., Haslem, D. S., Romero, R., Fulde, G., Bell, J. M., Jonasson, J. G., Steingrimsson, E., Ji, H. P., Nadauld, L. D. 2022; 12 (1): 10333

    Abstract

    Autophagy is a housekeeping mechanism tasked with eliminating misfolded proteins and damaged organelles to maintain cellular homeostasis. Autophagy deficiency results in increased oxidative stress, DNA damage and chronic cellular injury. Among the core genes in the autophagy machinery, ATG7 is required for autophagy initiation and autophagosome formation. Based on the analysis of an extended pedigree of familial cholangiocarcinoma, we determined that all affected family members had a novel germline mutation (c.2000C>T p.Arg659* (p.R659*)) in ATG7. Somatic deletions of ATG7 were identified in the tumors of affected individuals. We applied linked-read sequencing to one tumor sample and demonstrated that the ATG7 somatic deletion and germline mutation were located on distinct alleles, resulting in two hits to ATG7. From a parallel population genetic study, we identified a germline polymorphism of ATG7 (c.1591C>G p.Asp522Glu (p.D522E)) associated with increased risk of cholangiocarcinoma. To characterize the impact of these germline ATG7 variants on autophagy activity, we developed an ATG7-null cell line derived from the human bile duct. The mutant p.R659* ATG7 protein lacked the ability to lipidate its LC3 substrate, leading to complete loss of autophagy and increased p62 levels. Our findings indicate that germline ATG7 variants have the potential to impact autophagy function with implications for cholangiocarcinoma development.

    View details for DOI 10.1038/s41598-022-13569-4

    View details for PubMedID 35725745

  • Reconstructing the spatial evolution of cancer through subclone detection on copy number profiles in tumor sequencing data Wu, C., Hess, P. R., Sathe, A., Rong, J., Lau, B. T., Grimes, S. M., Ji, H. P., Zhang, N. R. AMER ASSOC CANCER RESEARCH. 2022
  • A single-cell solution for solid tumors to detect mutations and quantify copy number variations. Wu, C., Hess, P. R., Sathe, A., Rong, J., Lau, B. T., Grimes, S. M., Ji, H. P., Zhang, N. R. AMER ASSOC CANCER RESEARCH. 2022
  • Reconstructing the spatial evolution of cancer through subclone detection on copy number profiles in tumor sequencing data. Wu, C., Hess, P. R., Sathe, A., Rong, J., Lau, B. T., Grimes, S. M., Ji, H. P., Zhang, N. R. AMER ASSOC CANCER RESEARCH. 2022
  • ALTEN: A High-Fidelity Primary Tissue-Engineering Platform to Assess Cellular Responses Ex Vivo. Advanced science (Weinheim, Baden-Wurttemberg, Germany) Law, A. M., Chen, J., Colino-Sanguino, Y., Fuente, L. R., Fang, G., Grimes, S. M., Lu, H., Huang, R. J., Boyle, S. T., Venhuizen, J., Castillo, L., Tavakoli, J., Skhinas, J. N., Millar, E. K., Beretov, J., Rossello, F. J., Tipper, J. L., Ormandy, C. J., Samuel, M. S., Cox, T. R., Martelotto, L., Jin, D., Valdes-Mora, F., Ji, H. P., Gallego-Ortega, D. 2022: e2103332

    Abstract

    To fully investigate cellular responses to stimuli and perturbations within tissues, it is essential to replicate the complex molecular interactions within the local microenvironment of cellular niches. Here, the authors introduce Alginate-based tissue engineering (ALTEN), a biomimetic tissue platform that allows ex vivo analysis of explanted tissue biopsies. This method preserves the original characteristics of the source tissue's cellular milieu, allowing multiple and diverse cell types to be maintained over an extended period of time. As a result, ALTEN enables rapid and faithful characterization of perturbations across specific cell types within a tissue. Importantly, using single-cell genomics, this approach provides integrated cellular responses at the resolution of individual cells. ALTEN is a powerful tool for the analysis of cellular responses upon exposure to cytotoxic agents and immunomodulators. Additionally, ALTEN's scalability using automated microfluidic devices for tissue encapsulation and subsequent transport, to enable centralized high-throughput analysis of samples gathered by large-scale multicenter studies, is shown.

    View details for DOI 10.1002/advs.202103332

    View details for PubMedID 35611998

  • KmerKeys: a web resource for searching indexed genome assemblies and variants. Nucleic acids research Pavlichin, D. S., Lee, H., Greer, S. U., Grimes, S. M., Weissman, T., Ji, H. P. 2022

    Abstract

    K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome assembly has billions of k-mers. As a result, the computational requirements for analyzing k-mer information is enormous, particularly when involving complete genome assemblies. To address these issues, we developed a new indexing data structure based on a hash table tuned for the lookup of short sequence keys. This web application, referred to as KmerKeys, provides performant, rapid query speeds for cloud computation on genome assemblies. We enable fuzzy as well as exact sequence searches of assemblies. To enable robust and speedy performance, the website implements cache-friendly hash tables, memory mapping and massive parallel processing. Our method employs a scalable and efficient data structure that can be used to jointly index and search a large collection of human genome assembly information. One can include variant databases and their associated metadata such as the gnomAD population variant catalogue. This feature enables the incorporation of future genomic information into sequencing analysis. KmerKeys is freely accessible at https://kmerkeys.dgi-stanford.org.

    View details for DOI 10.1093/nar/gkac266

    View details for PubMedID 35474383

  • The Human Pangenome Project: a global resource to map genomic diversity. Nature Wang, T., Antonacci-Fulton, L., Howe, K., Lawson, H. A., Lucas, J. K., Phillippy, A. M., Popejoy, A. B., Asri, M., Carson, C., Chaisson, M. J., Chang, X., Cook-Deegan, R., Felsenfeld, A. L., Fulton, R. S., Garrison, E. P., Garrison, N. A., Graves-Lindsay, T. A., Ji, H., Kenny, E. E., Koenig, B. A., Li, D., Marschall, T., McMichael, J. F., Novak, A. M., Purushotham, D., Schneider, V. A., Schultz, B. I., Smith, M. W., Sofia, H. J., Weissman, T., Flicek, P., Li, H., Miga, K. H., Paten, B., Jarvis, E. D., Hall, I. M., Eichler, E. E., Haussler, D., Human Pangenome Reference Consortium 2022; 604 (7906): 437-446

    Abstract

    The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goalof constructing the highest-possible quality human pangenome reference. Our goal is toimprove data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.

    View details for DOI 10.1038/s41586-022-04601-8

    View details for PubMedID 35444317

  • A deep learning model for molecular label transfer that enables cancer cell identification from histopathology images. NPJ precision oncology Su, A., Lee, H., Tan, X., Suarez, C. J., Andor, N., Nguyen, Q., Ji, H. P. 2022; 6 (1): 14

    Abstract

    Deep-learning classification systems have the potential to improve cancer diagnosis. However, development of these computational approaches so far depends on prior pathological annotations and large training datasets. The manual annotation is low-resolution, time-consuming, highly variable and subject to observer variance. To address this issue, we developed a method, H&E Molecular neural network (HEMnet). HEMnet utilizes immunohistochemistry as an initial molecular label for cancer cells on a H&E image and trains a cancer classifier on the overlapping clinical histopathological images. Using this molecular transfer method, HEMnet successfully generated and labeled 21,939 tumor and 8782 normal tiles from ten whole-slide images for model training. After building the model, HEMnet accurately identified colorectal cancer regions, which achieved 0.84 and 0.73 of ROC AUC values compared to p53 staining and pathological annotations, respectively. Our validation study using histopathology images from TCGA samples accurately estimated tumor purity, which showed a significant correlation (regression coefficient of 0.8) with the estimation based on genomic sequencing data. Thus, HEMnet contributes to addressing two main challenges in cancer deep-learning analysis, namely the need to have a large number of images for training and the dependence on manual labeling by a pathologist. HEMnet also predicts cancer cells at a much higher resolution compared to manual histopathologic evaluation. Overall, our method provides a path towards a fully automated delineation of any type of tumor so long as there is a cancer-oriented molecular stain available for subsequent learning. Software, tutorials and interactive tools are available at: https://github.com/BiomedicalMachineLearning/HEMnet.

    View details for DOI 10.1038/s41698-022-00252-0

    View details for PubMedID 35236916

  • Analysis of 16S rRNA sequencing in advanced colorectal cancer tissue samples An, H., Partha, M. A., Lee, H., Lau, B., Shin, G., Almeda, A., Ji, H. P. LIPPINCOTT WILLIAMS & WILKINS. 2022
  • Single-cell characterization of CRISPR-modified transcript isoforms with nanopore sequencing. Genome biology Kim, H. S., Grimes, S. M., Hooker, A. C., Lau, B. T., Ji, H. P. 2021; 22 (1): 331

    Abstract

    We developed a single-cell approach to detect CRISPR-modified mRNA transcript structures. This method assesses how genetic variants at splicing sites and splicing factors contribute to alternative mRNA isoforms. We determine how alternative splicing is regulated by editing target exon-intron segments or splicing factors by CRISPR-Cas9 and their consequences on transcriptome profile. Our method combines long-read sequencing to characterize the transcript structure and short-read sequencing to match the single-cell gene expression profiles and gRNA sequence and therefore provides targeted genomic edits and transcript isoform structure detection at single-cell resolution.

    View details for DOI 10.1186/s13059-021-02554-1

    View details for PubMedID 34872615

  • Characterization of the consensus mucosal microbiome of colorectal cancer. NAR cancer Zhao, L., Grimes, S. M., Greer, S. U., Kubit, M., Lee, H., Nadauld, L. D., Ji, H. P. 1800; 3 (4): zcab049

    Abstract

    Dysbioisis is an imbalance of an organ's microbiome and plays a role in colorectal cancer pathogenesis. Characterizing the bacteria in the microenvironment of a cancer through genome sequencing has advantages compared to culture-based profiling. However, there are notable technical and analytical challenges in characterizing universal features of tumor microbiomes. Colorectal tumors demonstrate microbiome variation among different studies and across individual patients. To address these issues, we conducted a computational study to determine a consensus microbiome for colorectal cancer, analyzing 924 tumors from eight independent RNA-Seq data sets. A standardized meta-transcriptomic analysis pipeline was established with quality control metrics. Microbiome profiles across different cohorts were compared and recurrently altered microbial shifts specific to colorectal cancer were determined. We identified cancer-specific set of 114 microbial species associated with tumors that were found among all investigated studies. Firmicutes, Bacteroidetes, Proteobacteria and Actinobacteria were among the four most abundant phyla for the colorectal cancer microbiome. Member species of Clostridia were depleted and Fusobacterium nucleatum was one of the most enriched bacterial species in tumors. Associations between the consensus species and specific immune cell types were noted. Our results are available as a web data resource for other researchers to explore (https://crc-microbiome.stanford.edu).

    View details for DOI 10.1093/narcan/zcab049

    View details for PubMedID 34988460

  • In Situ Vaccination Induces Changes in Follicular Lymphoma Tumor Cells That Correlate with Abscopal Clinical Regressions Haebe, S., Shree, T., Day, G., Sathe, A., Czerwinski, D. K., Grimes, S. M., Long, S. R., Martin, B., Hoppe, R., Ji, H. P., Levy, R. AMER SOC HEMATOLOGY. 2021
  • Therapeutic and Immunologic Responses Elicited By in Situ Vaccination with CpG, Ibrutinib, and Low-Dose Radiation Shree, T., Haebe, S., Czerwinski, D. K., Day, G., Sathe, A., Khodadoust, M. S., Frank, M. J., Beygi, S., Hoppe, R., Long, S. R., Martin, B., Ji, H. P., Levy, R. AMER SOC HEMATOLOGY. 2021
  • Single-Cell Transcriptomic Analysis of a Patient with Metastatic Appendiceal Adenocarcinoma: A Stem or Crypt Cell-Like Neoplasm? Ayala, C., Grimes, S. M., Lee, B., Ji, H. ELSEVIER SCIENCE INC. 2021: S240-S241
  • A Predictive Model to Guide Brain MRI Surveillance in Patients With Metastatic Lung Cancer: Impact on Real World Outcomes Wu, J., Ding, V., Luo, S., Choi, E., Hellyer, J., Myall, N., Henry, S., Wood, D., Stehr, H., Ji, H., Nagpal, S., Gephart, M., Wakelee, H., Neal, J., Han, S. ELSEVIER SCIENCE INC. 2021: S1177
  • Profiling diverse sequence tandem repeats in colorectal cancer reveals co-occurrence of microsatellite and chromosomal instability involving Chromosome 8. Genome medicine Shin, G., Greer, S. U., Hopmans, E., Grimes, S. M., Lee, H., Zhao, L., Miotke, L., Suarez, C., Almeda, A. F., Haraldsdottir, S., Ji, H. P. 2021; 13 (1): 145

    Abstract

    We developed a sensitive sequencing approach that simultaneously profiles microsatellite instability, chromosomal instability, and subclonal structure in cancer. We assessed diverse repeat motifs across 225 microsatellites on colorectal carcinomas. Our study identified elevated alterations at both selected tetranucleotide and conventional mononucleotide repeats. Many colorectal carcinomas had a mix of genomic instability states that are normally considered exclusive. An MSH3 mutation may have contributed to the mixed states. Increased copy number of chromosome arm 8q was most prevalent among tumors with microsatellite instability, including a case of translocation involving 8q. Subclonal analysis identified co-occurring driver mutations previously known to be exclusive.

    View details for DOI 10.1186/s13073-021-00958-z

    View details for PubMedID 34488871

  • Patient-derived ex vivo TME-models and single-cell sequencing reveal transcriptional responses to immunotherapy. Sathe, A., Chen, J., Grimes, S. M., Ayala, C. I., Poultsides, G., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2021
  • New Approaches to Moderate CRISPR-Cas9 Activity: Addressing Issues of Cellular Uptake and Endosomal Escape. Molecular therapy : the journal of the American Society of Gene Therapy van Hees, M., Slott, S., Hansen, A. H., Kim, H. S., Ji, H. P., Astakhova, K. 2021

    Abstract

    CRISPR-Cas9 is rapidly entering molecular biology and biomedicine as a promising gene-editing tool. A unique feature of CRISPR-Cas9 is a single guide RNA directing a Cas9 nuclease towards its genomic target. Herein, we highlight new approaches for improving cellular uptake and endosomal escape of CRISPR-Cas9. As opposed to other recently published works, this review is focused on non-viral carriers as a means to facilitate the cellular uptake of CRISPR-Cas9 through endocytosis. The majority of non-viral carriers, such as gold nanoparticles, polymer nanoparticles, lipid nanoparticles and nanoscale zeolitic imidazole frameworks, are developed with a focus towards optimizing the endosomal escape of CRISPR-Cas9 by taking advantage of the acidic environment in the late endosomes. Among the most broadly used methods for in vitro and ex vivo ribonucleotide protein transfection are electroporation and microinjection. Thus, other delivery formats are warranted for in vivo delivery of CRISPR-Cas9. Herein, we specifically revise the use of peptide and nanoparticle-based systems as platforms for CRISPR-Cas9 delivery in vivo. Finally, we highlight future perspectives of the CRISPR-Cas9 gene-editing tool and the prospects of using non-viral vectors to improve its bioavailability and therapeutic potential.

    View details for DOI 10.1016/j.ymthe.2021.06.003

    View details for PubMedID 34091053

  • Integrative single-cell analysis of allele-specific copy number alterations and chromatin accessibility in cancer. Nature biotechnology Wu, C., Lau, B. T., Kim, H. S., Sathe, A., Grimes, S. M., Ji, H. P., Zhang, N. R. 2021

    Abstract

    Cancer progression is driven by both somatic copy number aberrations (CNAs) and chromatin remodeling, yet little is known about the interplay between these two classes of events in shaping the clonal diversity of cancers. We present Alleloscope, a method for allele-specific copy number estimation that can be applied to single-cell DNA- and/or transposase-accessible chromatin-sequencing (scDNA-seq, ATAC-seq) data, enabling combined analysis of allele-specific copy number and chromatin accessibility. On scDNA-seq data from gastric, colorectal and breast cancer samples, with validation using matched linked-read sequencing, Alleloscope finds pervasive occurrence of highly complex, multiallelic CNAs, in which cells that carry varying allelic configurations adding to the same total copy number coevolve within a tumor. On scATAC-seq from two basal cell carcinoma samples and a gastric cancer cell line, Alleloscope detected multiallelic copy number events and copy-neutral loss-of-heterozygosity, enabling dissection of the contributions of chromosomal instability and chromatin remodeling to tumor evolution.

    View details for DOI 10.1038/s41587-021-00911-w

    View details for PubMedID 34017141

  • Profiling SARS-CoV-2 mutation fingerprints that range from the viral pangenome to individual infection quasispecies. Genome medicine Lau, B. T., Pavlichin, D., Hooker, A. C., Almeda, A., Shin, G., Chen, J., Sahoo, M. K., Huang, C. H., Pinsky, B. A., Lee, H. J., Ji, H. P. 2021; 13 (1): 62

    Abstract

    BACKGROUND: The genome of SARS-CoV-2 is susceptible to mutations during viral replication due to the errors generated by RNA-dependent RNA polymerases. These mutations enable the SARS-CoV-2 to evolve into new strains. Viral quasispecies emerge from de novo mutations that occur in individual patients. In combination, these sets of viral mutations provide distinct genetic fingerprints that reveal the patterns of transmission and have utility in contact tracing.METHODS: Leveraging thousands of sequenced SARS-CoV-2 genomes, we performed a viral pangenome analysis to identify conserved genomic sequences. We used a rapid and highly efficient computational approach that relies on k-mers, short tracts of sequence, instead of conventional sequence alignment. Using this method, we annotated viral mutation signatures that were associated with specific strains. Based on these highly conserved viral sequences, we developed a rapid and highly scalable targeted sequencing assay to identify mutations, detect quasispecies variants, and identify mutation signatures from patients. These results were compared to the pangenome genetic fingerprints.RESULTS: We built a k-mer index for thousands of SARS-CoV-2 genomes and identified conserved genomics regions and landscape of mutations across thousands of virus genomes. We delineated mutation profiles spanning common genetic fingerprints (the combination of mutations in a viral assembly) and a combination of mutations that appear in only a small number of patients. We developed a targeted sequencing assay by selecting primers from the conserved viral genome regions to flank frequent mutations. Using a cohort of 100 SARS-CoV-2 clinical samples, we identified genetic fingerprints consisting of strain-specific mutations seen across populations and de novo quasispecies mutations localized to individual infections. We compared the mutation profiles of viral samples undergoing analysis with the features of the pangenome.CONCLUSIONS: We conducted an analysis for viral mutation profiles that provide the basis of genetic fingerprints. Our study linked pangenome analysis with targeted deep sequenced SARS-CoV-2 clinical samples. We identified quasispecies mutations occurring within individual patients and determined their general prevalence when compared to over 70,000 other strains. Analysis of these genetic fingerprints may provide a way of conducting molecular contact tracing.

    View details for DOI 10.1186/s13073-021-00882-2

    View details for PubMedID 33875001

  • An expanded universe of cancer targets. Cell Hahn, W. C., Bader, J. S., Braun, T. P., Califano, A., Clemons, P. A., Druker, B. J., Ewald, A. J., Fu, H., Jagu, S., Kemp, C. J., Kim, W., Kuo, C. J., McManus, M., B Mills, G., Mo, X., Sahni, N., Schreiber, S. L., Talamas, J. A., Tamayo, P., Tyner, J. W., Wagner, B. K., Weiss, W. A., Gerhard, D. S., Cancer Target Discovery and Development Network, Dancik, V., Gill, S., Hua, B., Sharifnia, T., Viswanathan, V., Zou, Y., Dela Cruz, F., Kung, A., Stockwell, B., Boehm, J., Dempster, J., Manguso, R., Vazquez, F., Cooper, L. A., Du, Y., Ivanov, A., Lonial, S., Moreno, C. S., Niu, Q., Owonikoko, T., Ramalingam, S., Reyna, M., Zhou, W., Grandori, C., Shmulevich, I., Swisher, E., Cai, J., Chan, I. S., Dunworth, M., Ge, Y., Georgess, D., Grasset, E. M., Henriet, E., Knutsdottir, H., Lerner, M. G., Padmanaban, V., Perrone, M. C., Suhail, Y., Tsehay, Y., Warrier, M., Morrow, Q., Nechiporuk, T., Long, N., Saultz, J., Kaempf, A., Minnier, J., Tognon, C. E., Kurtz, S. E., Agarwal, A., Brown, J., Watanabe-Smith, K., Vu, T. Q., Jacob, T., Yan, Y., Robinson, B., Lind, E. F., Kosaka, Y., Demir, E., Estabrook, J., Grzadkowski, M., Nikolova, O., Chen, K., Deneen, B., Liang, H., Bassik, M. C., Bhattacharya, A., Brennan, K., Curtis, C., Gevaert, O., Ji, H. P., Karlsson, K. A., Karagyozova, K., Lo, Y., Liu, K., Nakano, M., Sathe, A., Smith, A. R., Spees, K., Wong, W. H., Yuki, K., Hangauer, M., Kaufman, D. S., Balmain, A., Bollam, S. R., Chen, W., Fan, Q., Kersten, K., Krummel, M., Li, Y. R., Menard, M., Nasholm, N., Schmidt, C., Serwas, N. K., Yoda, H. 2021; 184 (5): 1142–55

    Abstract

    The characterization of cancer genomes has provided insight into somatically altered genes across tumors, transformed our understanding of cancer biology, and enabled tailoring of therapeutic strategies. However, the function of most cancer alleles remains mysterious, and many cancer features transcend their genomes. Consequently, tumor genomic characterization does not influence therapy for most patients. Approaches to understand the function and circuitry of cancer genes provide complementary approaches to elucidate both oncogene and non-oncogene dependencies. Emerging work indicates that the diversity of therapeutic targets engendered by non-oncogene dependencies is much larger than the list of recurrently mutated genes. Here we describe a framework for this expanded list of cancer targets, providing novel opportunities for clinical translation.

    View details for DOI 10.1016/j.cell.2021.02.020

    View details for PubMedID 33667368

  • Single Cell Analysis Can Define Distinct Evolution of Tumor Sites in Follicular Lymphoma. Blood Haebe, S. E., Shree, T. n., Sathe, A. n., Day, G. n., Czerwinski, D. K., Grimes, S. n., Lee, H. n., Binkley, M. S., Long, S. R., Martin, B. A., Ji, H. P., Levy, R. n. 2021

    Abstract

    Tumor heterogeneity complicates biomarker development and fosters drug resistance in solid malignancies. In lymphoma, our knowledge of site-to-site heterogeneity and its clinical implications is still limited. Here, we profiled two nodal, synchronously-acquired tumor samples from ten follicular lymphoma patients using single cell RNA, B cell receptor (BCR) and T cell receptor sequencing, and flow cytometry. By following the rapidly mutating tumor immunoglobulin genes, we discovered that BCR subclones were shared between the two tumor sites in some patients, but in many patients the disease had evolved separately with limited tumor cell migration between the sites. Patients exhibiting divergent BCR evolution also exhibited divergent tumor gene expression and cell surface protein profiles. While the overall composition of the tumor microenvironment did not differ significantly between sites, we did detect a specific correlation between site-to-site tumor heterogeneity and T follicular helper (Tfh) cell abundance. We further observed enrichment of particular ligand-receptor pairs between tumor and Tfh cells, including CD40 and CD40LG, and a significant correlation between tumor CD40 expression and Tfh proliferation. Our study may explain discordant responses to systemic therapies, underscores the difficulty of capturing a patient's disease with a single biopsy, and furthers our understanding of tumor-immune networks in follicular lymphoma.

    View details for DOI 10.1182/blood.2020009855

    View details for PubMedID 33728464

  • Pepsinogens and Gastrin Demonstrate Low Discrimination for Gastric Precancerous Lesions in a Multi-Ethnic United States Cohort. Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association Huang, R. n., Park, S. n., Shen, J. n., Longacre, T. n., Ji, H. n., Hwang, J. H. 2021

    View details for DOI 10.1016/j.cgh.2021.01.009

    View details for PubMedID 33434656

  • Unique k-mer sequences for validating cancer-related substitution, insertion and deletion mutations. NAR cancer Lee, H., Shuaibi, A., Bell, J. M., Pavlichin, D. S., Ji, H. P. 2020; 2 (4): zcaa034

    Abstract

    Cancer genome sequencing has led to important discoveries such as the identification of cancer genes. However, challenges remain in the analysis of cancer genome sequencing. One significant issue is that mutations identified by multiple variant callers are frequently discordant even when using the same genome sequencing data. For insertion and deletion mutations, oftentimes there is no agreement among different callers. Identifying somatic mutations involves read mapping and variant calling, a complicated process that uses many parameters and model tuning. To validate the identification of true mutations, we developed a method using k-mer sequences. First, we characterized the landscape of unique versus non-unique k-mers in the human genome. Second, we developed a software package, KmerVC, to validate the given somatic mutations from sequencing data. Our program validates the occurrence of a mutation based on statistically significant difference in frequency of k-mers with and without a mutation from matched normal and tumor sequences. Third, we tested our method on both simulated and cancer genome sequencing data. Counting k-mer involving mutations effectively validated true positive mutations including insertions and deletions across different individual samples in a reproducible manner. Thus, we demonstrated a straightforward approach for rapidly validating mutations from cancer genome sequencing data.

    View details for DOI 10.1093/narcan/zcaa034

    View details for PubMedID 33345188

  • SPATIAL SINGLE-CELL ANALYSIS OF COLORECTAL CANCER TUMOUR USING MULTIPLEXED IMAGING MASS CYTOMETRY Minh Tran, Su, A., Lee, H., Cruz, R., Pflieger, L., Dean, A., Quan Nguyen, Ji, H., Rhodes, T. BMJ PUBLISHING GROUP. 2020: A399
  • IDENTIFY IMMUNE CELL TYPES AND BIOMARKERS ASSOCIATED WITH IMMUNE-RELATED ADVERSE EVENTS USING SINGLE CELL RNA SEQUENCING Chen, J., Pflieger, L., Grimes, S., Baker, T., Brems, M., Fulde, G., Snow, S., Howe, P., Sathe, A., Christensen, B., Ji, H., Rhodes, T. BMJ PUBLISHING GROUP. 2020: A39
  • The COVID-19 XPRIZE and the need for scalable, fast, and widespread testing. Nature biotechnology MacKay, M. J., Hooker, A. C., Afshinnekoo, E., Salit, M., Kelly, J., Feldstein, J. V., Haft, N., Schenkel, D., Nambi, S., Cai, Y., Zhang, F., Church, G., Dai, J., Wang, C. L., Levy, S., Huber, J., Ji, H. P., Kriegel, A., Wyllie, A. L., Mason, C. E. 2020

    View details for DOI 10.1038/s41587-020-0655-4

    View details for PubMedID 32820257

  • A Summary of the 2020 Gastric Cancer Summit at Stanford University. Gastroenterology Huang, R. J., Koh, H., Hwang, J. H., Summit Leaders, Abnet, C. C., Alarid-Escudero, F., Amieva, M. R., Bruce, M. G., Camargo, M. C., Chan, A. T., Choi, I. J., Corvalan, A., Davis, J. L., Deapen, D., Epplein, M., Greenwald, D. A., Hamashima, C., Hur, C., Inadomi, J. M., Ji, H. P., Jung, H., Lee, E., Lin, B., Palaniappan, L. P., Parsonnet, J., Peek, R. M., Piazuelo, M. B., Rabkin, C. S., Shah, S. C., Smith, A., So, S., Stoffel, E. M., Umar, A., Wilson, K. T., Woo, Y., Yeoh, K. G. 2020

    View details for DOI 10.1053/j.gastro.2020.05.100

    View details for PubMedID 32707045

  • CRISPRpic: fast and precise analysis for CRISPR-induced mutations via prefixed index counting. NAR genomics and bioinformatics Lee, H., Chang, H. Y., Cho, S. W., Ji, H. P. 2020; 2 (2): lqaa012

    Abstract

    Analysis of CRISPR-induced mutations at targeted locus can be achieved by polymerase chain reaction amplification followed by parallel massive sequencing. We developed a novel algorithm, named as CRISPRpic, to analyze the sequencing reads for the CRISPR experiments via counting exact-matching and pattern-searching. Compare to the other methods based on sequence alignment, CRISPRpic provides precise mutation calling and ultrafast analysis of the sequencing results. Python script of CRISPRpic is available at https://github.com/compbio/CRISPRpic.

    View details for DOI 10.1093/nargab/lqaa012

    View details for PubMedID 32118203

  • Identify biomarkers associated with immunotoxicities using single-cell RNAseq. Chen, J., Pflieger, L., Sathe, A., Grimes, S., Brems, M., Pattison, T., Christensen, B., Rhodes, T., Ji, H. AMER ASSOC CANCER RESEARCH. 2020: 32
  • Comparative Genomic Analysis of High Grade Neuroendocrine Neoplasms across Diverse Organs Sun, T. Y., Van Hummelen, P., Martin, B., Xia, C., Zhao, L., Hornbacker, K., Lee, H., Ji, H., Kunz, P. KARGER. 2020: 51
  • Comprehensive genomic sequencing of high-grade neuroendocrine neoplasms Sun, T., Van Hummelen, P., Martin, B., Xia, C., Lee, H., Zhao, L., Hornbacker, K., Ji, H., Kunz, P. L. AMER SOC CLINICAL ONCOLOGY. 2020
  • Whole genome analysis identifies the association of TP53 genomic deletions with lower survival in Stage III colorectal cancer. Scientific reports Xia, L. C., Van Hummelen, P. n., Kubit, M. n., Lee, H. n., Bell, J. M., Grimes, S. M., Wood-Bouwens, C. n., Greer, S. U., Barker, T. n., Haslem, D. S., Ford, J. M., Fulde, G. n., Ji, H. P., Nadauld, L. D. 2020; 10 (1): 5009

    Abstract

    DNA copy number aberrations (CNA) are frequently observed in colorectal cancers (CRC). There is an urgent need for CNA-based biomarkers in clinics,. n For Stage III CRC, if combined with imaging or pathologic evidence, these markers promise more precise care. We conducted this Stage III specific biomarker discovery with a cohort of 134 CRCs, and with a newly developed high-efficiency CNA profiling protocol. Specifically, we developed the profiling protocol for tumor-normal matched tissue samples based on low-coverage clinical whole-genome sequencing (WGS). We demonstrated the protocol's accuracy and robustness by a systematic benchmark with microarray, high-coverage whole-exome and -genome approaches, where the low-coverage WGS-derived CNA segments were highly accordant (PCC >0.95) with those derived from microarray, and they were substantially less variable if compared to exome-derived segments. A lasso-based model and multivariate cox regression analysis identified a chromosome 17p loss, containing the TP53 tumor suppressor gene, that was significantly associated with reduced survival (P = 0.0139, HR = 1.688, 95% CI = [1.112-2.562]), which was validated by an independent cohort of 187 Stage III CRCs. In summary, this low-coverage WGS protocol has high sensitivity, high resolution and low cost and the identified 17p-loss is an effective poor prognosis marker for Stage III patients.

    View details for DOI 10.1038/s41598-020-61643-6

    View details for PubMedID 32193467

  • One Size Does Not Fit All: Marked Heterogeneity in Incidence of and Survival from Gastric Cancer among Asian American Subgroups. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology Huang, R. J., Sharp, N. n., Talamoa, R. O., Ji, H. P., Hwang, J. H., Palaniappan, L. P. 2020

    Abstract

    Asian Americans are at higher risk for non-cardia gastric cancers (NCGCs) relative to non-Hispanic Whites (NHWs). Asian Americans are genetically, linguistically, and culturally heterogeneous, yet have mostly been treated as a single population in prior studies. This aggregation may obscure important subgroup-specific cancer patterns.We utilized data from 13 regional United States cancer registries from 1990-2014 to determine secular trends in incidence and survivorship from NCGC. Data were analyzed for NHWs and the six largest Asian American subgroups: Chinese, Japanese, Filipino, Korean, Vietnamese, and South Asian (Indian/Pakistani).There exists substantial heterogeneity in NCGC incidence between Asian subgroups, with Koreans (48.6 per 100,000 person-years) having seven-fold higher age-adjusted incidence than South Asians (7.4 per 100,000 person-years). Asians had generally earlier stages of diagnosis and higher rates of surgical resection compared to NHWs. All Asian subgroups also demonstrated higher five-year observed survival compared to NHWs, with Koreans (41.3%) and South Asians (42.8%) having survival double that of NHWs (20.1%, p<0.001). In multivariable regression, differences in stage of diagnosis and rates of resection partially explained the difference in survivorship between Asian subgroups.We find substantial differences in incidence, staging, histology, treatment, and survivorship from NCGC between Asian subgroups, data which challenge our traditional perceptions about gastric cancer in Asians. Both biological heterogeneity and cultural/environmental differences may underlie these findings.These data are relevant to the national discourse regarding the appropriate role of gastric cancer screening, and identifies high-risk racial/ethnic subgroups who many benefit from customized risk attenuation programs.

    View details for DOI 10.1158/1055-9965.EPI-19-1482

    View details for PubMedID 32152216

  • Single cell genomic characterization reveals the cellular reprogramming of the gastric tumor microenvironment. Clinical cancer research : an official journal of the American Association for Cancer Research Sathe, A. n., Grimes, S. M., Lau, B. T., Chen, J. n., Suarez, C. n., Huang, R. J., Poultsides, G. A., Ji, H. P. 2020

    Abstract

    The tumor microenvironment (TME) consists of a heterogenous cellular milieu that can influence cancer cell behavior. Its characteristics havean impact on treatments such as immunotherapy. These features can be revealed with single-cell RNA sequencing (scRNA-seq). We hypothesized that scRNA-seq analysis ofgastric cancer (GC) together with paired normal tissue and peripheral blood mononuclear cells (PBMCs) would identify critical elements of cellular deregulation not apparent with other approaches.scRNA-seq was conducted on seven patients with GC and one patient with intestinal metaplasia. We sequenced 56,167 cells comprising GC (32,407 cells), paired normal tissue (18,657 cells) and PBMCs (5,103 cells). Protein expression was validated by multiplex immunofluorescence.Tumor epithelium had copy number alterations, a distinct gene expression program from normal, with intra-tumor heterogeneity. GC TME was significantly enriched for stromal cells, macrophages, dendritic cells (DCs) and Tregs. TME-exclusive stromal cells expressed distinct extracellular matrix components than normal. Macrophages were transcriptionally heterogenous and did not conform to a binary M1/M2 paradigm. Tumor-DCs had a unique gene expression program compared to PBMC DCs. TME-specific cytotoxic T cells were exhausted with two heterogenous subsets. Helper, cytotoxic T, Treg and NK cells expressed multiple immune checkpoint or costimulatory molecules. Receptor-ligand analysis revealed TME-exclusive inter-cellular communication.Single-cell gene expression studies revealed widespread reprogramming across multiple cellular elements in the GC TME. Cellular remodeling was delineated by changes in cell numbers, transcriptional states and inter-cellular interactions. This characterization facilitates understanding of tumor biology and enables identification of novel targets including for immunotherapy.

    View details for DOI 10.1158/1078-0432.CCR-19-3231

    View details for PubMedID 32060101

  • Ultra-fast detection and quantification of nucleic acids by amplification-free fluorescence assay. The Analyst Uhd, J. n., Miotke, L. n., Ji, H. P., Dunaeva, M. n., Pruijn, G. J., Jørgensen, C. D., Kristoffersen, E. L., Birkedal, V. n., Yde, C. W., Nielsen, F. C., Hansen, J. n., Astakhova, K. n. 2020

    Abstract

    Two types of clinically important nucleic acid biomarkers, microRNA (miRNA) and circulating tumor DNA (ctDNA) were detected and quantified from human serum using an amplification-free fluorescence hybridization assay. Specifically, miRNAs hsa-miR-223-3p and hsa-miR-486-5p with relevance for rheumatoid arthritis and cancer related mutations BRAF and KRAS of ctDNA were directly measured. The required oligonucleotide probes for the assay were rationally designed and synthesized through a novel "clickable" approach which is time and cost-effective. With no need for isolating nucleic acid components from serum, the fluoresence-based assay took only 1 hour. Detection and absolute quantification of targets was successfully achieved despite their notoriously low abundance, with a precision down to individual nucleotides. Obtained miRNA and ctDNA amounts showed overall a good correlation with current techniques. With appropriate probes, our novel assay and signal boosting approach could become a useful tool for point-of-care measuring other low abundance nucleic acid biomarkers.

    View details for DOI 10.1039/d0an00676a

    View details for PubMedID 32648858

  • Strain-resolved microbiome sequencing reveals mobile elements that drive bacterial competition on a clinical timescale. Genome medicine Zlitni, S. n., Bishara, A. n., Moss, E. L., Tkachenko, E. n., Kang, J. B., Culver, R. N., Andermann, T. M., Weng, Z. n., Wood, C. n., Handy, C. n., Ji, H. P., Batzoglou, S. n., Bhatt, A. S. 2020; 12 (1): 50

    Abstract

    Populations of closely related microbial strains can be simultaneously present in bacterial communities such as the human gut microbiome. We recently developed a de novo genome assembly approach that uses read cloud sequencing to provide more complete microbial genome drafts, enabling precise differentiation and tracking of strain-level dynamics across metagenomic samples. In this case study, we present a proof-of-concept using read cloud sequencing to describe bacterial strain diversity in the gut microbiome of one hematopoietic cell transplantation patient over a 2-month time course and highlight temporal strain variation of gut microbes during therapy. The treatment was accompanied by diet changes and administration of multiple immunosuppressants and antimicrobials.We conducted short-read and read cloud metagenomic sequencing of DNA extracted from four longitudinal stool samples collected during the course of treatment of one hematopoietic cell transplantation (HCT) patient. After applying read cloud metagenomic assembly to discover strain-level sequence variants in these complex microbiome samples, we performed metatranscriptomic analysis to investigate differential expression of antibiotic resistance genes. Finally, we validated predictions from the genomic and metatranscriptomic findings through in vitro antibiotic susceptibility testing and whole genome sequencing of isolates derived from the patient stool samples.During the 56-day longitudinal time course that was studied, the patient's microbiome was profoundly disrupted and eventually dominated by Bacteroides caccae. Comparative analysis of B. caccae genomes obtained using read cloud sequencing together with metagenomic RNA sequencing allowed us to identify differences in substrain populations over time. Based on this, we predicted that particular mobile element integrations likely resulted in increased antibiotic resistance, which we further supported using in vitro antibiotic susceptibility testing.We find read cloud assembly to be useful in identifying key structural genomic strain variants within a metagenomic sample. These strains have fluctuating relative abundance over relatively short time periods in human microbiomes. We also find specific structural genomic variations that are associated with increased antibiotic resistance over the course of clinical treatment.

    View details for DOI 10.1186/s13073-020-00747-0

    View details for PubMedID 32471482

  • Joint single cell DNA-seq and RNA-seq of gastric cancer cell lines reveals rules of in vitro evolution. NAR genomics and bioinformatics Andor, N. n., Lau, B. T., Catalanotti, C. n., Sathe, A. n., Kubit, M. n., Chen, J. n., Blaj, C. n., Cherry, A. n., Bangs, C. D., Grimes, S. M., Suarez, C. J., Ji, H. P. 2020; 2 (2): lqaa016

    Abstract

    Cancer cell lines are not homogeneous nor are they static in their genetic state and biological properties. Genetic, transcriptional and phenotypic diversity within cell lines contributes to the lack of experimental reproducibility frequently observed in tissue-culture-based studies. While cancer cell line heterogeneity has been generally recognized, there are no studies which quantify the number of clones that coexist within cell lines and their distinguishing characteristics. We used a single-cell DNA sequencing approach to characterize the cellular diversity within nine gastric cancer cell lines and integrated this information with single-cell RNA sequencing. Overall, we sequenced the genomes of 8824 cells, identifying between 2 and 12 clones per cell line. Using the transcriptomes of more than 28 000 single cells from the same cell lines, we independently corroborated 88% of the clonal structure determined from single cell DNA analysis. For one of these cell lines, we identified cell surface markers that distinguished two subpopulations and used flow cytometry to sort these two clones. We identified substantial proportions of replicating cells in each cell line, assigned these cells to subclones detected among the G0/G1 population and used the proportion of replicating cells per subclone as a surrogate of each subclone's growth rate.

    View details for DOI 10.1093/nargab/lqaa016

    View details for PubMedID 32215369

    View details for PubMedCentralID PMC7079336

  • Site to Site Comparison of Follicular Lymphoma Biopsies By Single Cell RNA Sequencing Haebe, S., Shree, T., Sathe, A., Day, G., Lee, H., Czerwinski, D. K., Grimes, S., Ji, H., Levy, R. AMER SOC HEMATOLOGY. 2019
  • Dynamic Immune Modulation Seen By Single Cell RNA-Sequencing of Serial Lymphoma Biopsies in Patients Undergoing in Situ Vaccination Shree, T., Haebe, S., Sathe, A., Day, G., Lee, H., Czerwinski, D. K., Grimes, S., Ji, H., Levy, R. AMER SOC HEMATOLOGY. 2019
  • Structural variant analysis for linked-read sequencing data with gemtools BIOINFORMATICS Greer, S. U., Ji, H. P. 2019; 35 (21): 4397–99
  • Single cell RNA sequencing of serial tumor and blood biopsies from lymphoma patients undergoing in situ vaccination Shree, T., Sathe, A., Ji, H., Levy, R. AMER ASSOC CANCER RESEARCH. 2019
  • iGRAMMy: Cloud-based characterization of microbial landscape in colorectal cancers Xia, L. C., Ai, D., Guo, M., Ji, H. AMER ASSOC CANCER RESEARCH. 2019
  • Single cell RNA sequencing reveals multiple adaptive resistance mechanisms to regorafenib in colon cancer Sathe, A., Lau, B. T., Grimes, S., Greer, S., Ji, H. AMER ASSOC CANCER RESEARCH. 2019
  • Comprehensive characterization of gastric cancer at single-cell resolution Chen, J., Sathe, A., Grimes, S., Greer, S., Lau, B., Renschler, A., Poultsides, G., Suarez, C., Ji, H. AMER ASSOC CANCER RESEARCH. 2019
  • A functional CRISPR/Cas9 screen identifies kinases that modulate FGFR inhibitor response in gastric cancer ONCOGENESIS Chen, J., Bell, J., Lau, B. T., Whittaker, T., Stapleton, D., Ji, H. P. 2019; 8
  • Structural variant analysis for linked-read sequencing data with gemtools. Bioinformatics (Oxford, England) Greer, S. U., Ji, H. P. 2019

    Abstract

    SUMMARY: Linked-read sequencing generates synthetic long reads which are useful for the detection and analysis of structural variants (SVs). The software associated with 10X Genomics linked-read sequencing, Long Ranger, generates the essential output files (BAM, VCF, SV BEDPE) necessary for downstream analyses. However, to perform downstream analyses requires the user to customize their own tools to handle the unique features of linked-read sequencing data. Here, we describe gemtools, a collection of tools for the downstream and in-depth analysis of structural variants from linked-read data. Gemtools uses the barcoded aligned reads and the Megabase-scale phase blocks to determine haplotypes of structural variant breakpoints and delineate complex breakpoint configurations at the resolution of single DNA molecules. The gemtools package is a suite of tools that provides the user with the flexibility to perform basic functions on their linked-read sequencing output in order to address even more questions.AVAILABILITY AND IMPLEMENTATION: The gemtools package is freely available for download at: https://github.com/sgreer77/gemtools.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for PubMedID 30938757

  • Single-cell transcriptome analysis identifies distinct cell types and niche signaling in a primary gastric organoid model. Scientific reports Chen, J., Lau, B. T., Andor, N., Grimes, S. M., Handy, C., Wood-Bouwens, C., Ji, H. P. 2019; 9 (1): 4536

    Abstract

    The diverse cellular milieu of the gastric tissue microenvironment plays a critical role in normal tissue homeostasis and tumor development. However, few cell culture model can recapitulate the tissue microenvironment and intercellular signaling in vitro. We used a primary tissue culture system to generate a murine p53 null gastric tissue model containing both epithelium and mesenchymal stroma. To characterize the microenvironment and niche signaling, we used single cell RNA sequencing (scRNA-Seq) to determine the transcriptomes of 4,391 individual cells. Based on specific markers, we identified epithelial cells, fibroblasts and macrophages in initial tissue explants during organoid formation. The majority of macrophages were polarized towards wound healing and tumor promotion M2-type. During the course of time, the organoids maintained both epithelial and fibroblast lineages with the features of immature mouse gastric stomach. We detected a subset of cells in both lineages expressing Lgr5, one of the stem cell markers. We examined the lineage-specific Wnt signaling activation, and identified that Rspo3 was specifically expressed in the fibroblast lineage, providing an endogenous source of the R-spondin to activate Wnt signaling. Our studies demonstrate that this primary tissue culture system enables one to study gastric tissue niche signaling and immune response in vitro.

    View details for PubMedID 30872643

  • Single-cell transcriptome analysis identifies distinct cell types and niche signaling in a primary gastric organoid model SCIENTIFIC REPORTS Chen, J., Lau, B. T., Andor, N., Grimes, S. M., Handy, C., Wood-Bouwens, C., Ji, H. P. 2019; 9
  • Haplotype-resolved and integrated genome analysis of the cancer cell line HepG2. Nucleic acids research Zhou, B., Ho, S. S., Greer, S. U., Spies, N., Bell, J. M., Zhang, X., Zhu, X., Arthur, J. G., Byeon, S., Pattni, R., Saha, I., Huang, Y., Song, G., Perrin, D., Wong, W. H., Ji, H. P., Abyzov, A., Urban, A. E. 2019

    Abstract

    HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.

    View details for PubMedID 30864654

  • Single-cell RNA-Seq of follicular lymphoma reveals malignant B-cell types and coexpression of T-cell immune checkpoints BLOOD Andor, N., Simonds, E. F., Czerwinski, D. K., Chen, J., Grimes, S. M., Wood-Bouwens, C., Zheng, G. Y., Kubit, M. A., Greer, S., Weiss, W. A., Levy, R., Ji, H. P. 2019; 133 (10): 1119–29
  • Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562 GENOME RESEARCH Zhou, B., Ho, S. S., Greer, S. U., Zhu, X., Bell, J. M., Arthur, J. G., Spies, N., Zhang, X., Byeon, S., Pattni, R., Ben-Efraim, N., Haney, M. S., Haraksingh, R. R., Song, G., Ji, H. P., Perrin, D., Wong, W. H., Abyzov, A., Urban, A. E. 2019; 29 (3): 472–84
  • Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562. Genome research Zhou, B., Ho, S. S., Greer, S. U., Zhu, X., Bell, J. M., Arthur, J. G., Spies, N., Zhang, X., Byeon, S., Pattni, R., Ben-Efraim, N., Haney, M. S., Haraksingh, R. R., Song, G., Ji, H. P., Perrin, D., Wong, W. H., Abyzov, A., Urban, A. E. 2019

    Abstract

    K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.

    View details for PubMedID 30737237

  • Targeted short read sequencing and assembly of re-arrangements and candidate gene loci provide megabase diplotypes. Nucleic acids research Shin, G. n., Greer, S. U., Xia, L. C., Lee, H. n., Zhou, J. n., Boles, T. C., Ji, H. P. 2019

    Abstract

    The human genome is composed of two haplotypes, otherwise called diplotypes, which denote phased polymorphisms and structural variations (SVs) that are derived from both parents. Diplotypes place genetic variants in the context of cis-related variants from a diploid genome. As a result, they provide valuable information about hereditary transmission, context of SV, regulation of gene expression and other features which are informative for understanding human genetics. Successful diplotyping with short read whole genome sequencing generally requires either a large population or parent-child trio samples. To overcome these limitations, we developed a targeted sequencing method for generating megabase (Mb)-scale haplotypes with short reads. One selects specific 0.1-0.2 Mb high molecular weight DNA targets with custom-designed Cas9-guide RNA complexes followed by sequencing with barcoded linked reads. To test this approach, we designed three assays, targeting the BRCA1 gene, the entire 4-Mb major histocompatibility complex locus and 18 well-characterized SVs, respectively. Using an integrated alignment- and assembly-based approach, we generated comprehensive variant diplotypes spanning the entirety of the targeted loci and characterized SVs with exact breakpoints. Our results were comparable in quality to long read sequencing.

    View details for DOI 10.1093/nar/gkz661

    View details for PubMedID 31350896

  • scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome biology Alquicira-Hernandez, J. n., Sathe, A. n., Ji, H. P., Nguyen, Q. n., Powell, J. E. 2019; 20 (1): 264

    Abstract

    Single-cell RNA sequencing has enabled the characterization of highly specific cell types in many tissues, as well as both primary and stem cell-derived cell lines. An important facet of these studies is the ability to identify the transcriptional signatures that define a cell type or state. In theory, this information can be used to classify an individual cell based on its transcriptional profile. Here, we present scPred, a new generalizable method that is able to provide highly accurate classification of single cells, using a combination of unbiased feature selection from a reduced-dimension space, and machine-learning probability-based prediction method. We apply scPred to scRNA-seq data from pancreatic tissue, mononuclear cells, colorectal tumor biopsies, and circulating dendritic cells and show that scPred is able to classify individual cells with high accuracy. The generalized method is available at https://github.com/powellgenomicslab/scPred/.

    View details for DOI 10.1186/s13059-019-1862-5

    View details for PubMedID 31829268

  • Modeling the Evolution of Ploidy in a Resource Restricted Environment Kimmel, G., Barnholtz-Sloan, J., Ji, H., Altrock, P., Andor, N., Bebis, G., Benos, T., Chen, K., Jahn, K., Lima, E. SPRINGER INTERNATIONAL PUBLISHING AG. 2019: 29–34
  • Therapeutic Monitoring of Circulating DNA Mutations in Metastatic Cancer with Personalized Digital PCR. The Journal of molecular diagnostics : JMD Wood-Bouwens, C. M., Haslem, D. n., Moulton, B. n., Almeda, A. F., Lee, H. n., Heestand, G. M., Nadauld, L. D., Ji, H. P. 2019

    Abstract

    As a high-performance solution for longitudinal monitoring of patients being treated for metastatic cancer, we developed and a single-color digital PCR (dPCR) assay that detects and quantifies specific cancer mutations present in circulating tumor DNA (ctDNA). This customizable assay has a high sensitivity of detection. One can detect a mutation allelic fraction of 0.1%, equivalent to three mutation-bearing DNA molecules among 3,000 genome equivalents. The objective of this study was to validate the use of personalized dPCR mutation assays to monitor patients with metastatic cancer. We compared our digital PCR results to serum biomarkers indicating disease progression or response. Patients had metastatic colorectal, biliary, breast, lung and melanoma cancers. Mutations occurred in essential cancer drivers such as BRAF, KRAS and PIK3CA. We monitored patients over multiple cycles of treatment up to a year. All patients had detectable ctDNA mutations. Our results correlated with serum markers of metastatic cancer burden including CEA, CA-19-9, and CA-15-3, and qualitatively corresponding to imaging studies. We observed corresponding trends among these patients receiving active treatment with chemotherapy or targeted agents. For example, in one patient under active treatment, we detected increasing quantities of ctDNA molecules over time, indicating recurrence of tumor. Our study demonstrates that personalized digital PCR enables longitudinal monitoring of patients with metastatic cancer and maybe a useful indicator for treatment response.

    View details for DOI 10.1016/j.jmoldx.2019.10.008

    View details for PubMedID 31837432

  • Covalent 'click chemistry'-based attachment of DNA onto solid phase enables iterative molecular analysis. Analytical chemistry Lau, B. T., Ji, H. P. 2019

    Abstract

    Molecular analysis of DNA samples with limited quantities can be challenging. Repeatedly sequencing the original DNA molecules from a given sample would overcome many issues related to accurate genetic analysis and mitigate issues with processing small amounts of DNA analyte. Moreover, an iterative, replicated analysis of the same DNA molecule has the potential to improve genetic characterization. Herein, we demonstrate that the use of 'click'-based attachment of DNA sequencing libraries onto an agarose bead support enables repetitive primer extension assays for specific genomic DNA targets such as gene exons. We validated the performance of this assay for evaluating specific genetic alterations in both normal and cancer reference standard DNA samples. We demonstrate the stability of conjugated DNA libraries and related sequencing results over the course of independent serial assays spanning several months from the same set of samples. Finally, we finally applied this method to DNA derived from a tumor sample and demonstrated improved mutation detection accuracy.

    View details for PubMedID 30652472

  • Single-cell RNA-Seq of lymphoma cancers reveals malignant B cell types and co-expression of T cell immune checkpoints. Blood Andor, N., Simonds, E. F., Czerwinski, D. K., Chen, J., Grimes, S. M., Wood-Bouwens, C., Zheng, G. X., Kubit, M. A., Greer, S., Weiss, W. A., Levy, R., Ji, H. P. 2018

    Abstract

    Follicular lymphoma (FL) is a low-grade B cell malignancy that transforms into a highly aggressive and lethal disease at a rate of 2% per year. Perfect isolation of the malignant B cell population from a surgical biopsy is a significant challenge, masking important FL biology, such as immune checkpoint co-expression patterns. To resolve the underlying transcriptional networks of follicular B cell lymphomas we analyzed the transcriptomes of 34,188 cells derived from six primary FL tumors. For each tumor, we identified normal immune subpopulations and malignant B cells based on gene expression. We used multicolor flow cytometry analysis of the same tumors to confirm our assignments of cellular lineages and validate our predictions of expressed proteins. Comparison of gene expression between matched malignant and normal B cells from the same patient revealed tumor-specific features. Malignant B cells exhibited restricted immunoglobulin light chain expression (either Ig Kappa or Ig Lambda), as well the expected upregulation of the BCL2 gene, but also down-regulation of the FCER2, CD52 and MHC class II genes. By analyzing thousands of individual cells per patient tumor, we identified the mosaic of malignant B cell subclones that coexist within a FL and examined the characteristics of tumor-infiltrating T cells. We identified genes co-expressed with immune checkpoint molecules, such as CEBPA and B2M in Tregs, providing a better understanding of the gene networks involved in immune regulation. In summary, parallel measurement of single-cell expression in thousands of tumor cells and tumor-infiltrating lymphocytes can be used to obtain a systems-level view of the tumor microenvironment and identify new avenues for therapeutic development.

    View details for PubMedID 30591526

  • Single Cell RNA Sequencing of Serial Tumor and Blood Biopsies from Lymphoma Patients on an in Situ Vaccination Clinical Trial Shree, T., Sathe, A., Czerwinski, D. K., Long, S. R., Ji, H., Levy, R. AMER SOC HEMATOLOGY. 2018
  • SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution. GigaScience Xia, L. C., Ai, D., Lee, H., Andor, N., Li, C., Zhang, N. R., Ji, H. P. 2018

    Abstract

    Background: Simulating genome sequence data with variant features facilitates the development and benchmarking of structural variant analysis programs. However, there are only a few data simulators that provide structural variants in silico and even fewer that provide variants with different allelic fraction and haplotypes.Findings: We developed SVEngine, an open source tool to address this need. SVEngine simulates next generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs) and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletions, insertions, duplications, inversions and translocations. Finally, SVEngine simulates sequence data that replicates the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time.Conclusions: We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated SVEngine's accuracy by simulating genome-wide structural variants of NA12878 and a heterogenous cancer genome. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift and neighbouring hanging read pairs for representative variant types. Structural variant callers Lumpy and Manta and tumor heterogeneity estimator THetA2 were able to perform realistically on the simulated data. SVEngine is implemented as a standard Python package and is freely available for academic use at: https://bitbucket.org/charade/svengine.

    View details for PubMedID 29982625

  • SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution GIGASCIENCE Xia, L., Ai, D., Lee, H., Andor, N., Li, C., Zhang, N. R., Ji, H. P. 2018; 7 (7)
  • Mapping the comprehensive landscape of missense-mutation neoantigens across the human genome Lee, H., Greer, S. U., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2018
  • Chromosome-scale haplotyping enables comprehensive discovery of cancer rearrangements and germline-related susceptibility mutations Greer, S. U., Lau, B. T., Nadauld, L. D., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2018
  • Highly sensitive digital detection of circulating DNA cancer mutations using synthetic genome standards Wood-Bouwens, C. M., St Onge, R. P., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2018
  • Integrated single-cell DNA and RNA analysis of intratumoral heterogeneity and immune lineages in colorectal and gastric tumor biopsies Lau, B., Andor, N., Sathe, A., Wood-Bouwens, C., Poultsides, G., Ji, H. AMER ASSOC CANCER RESEARCH. 2018
  • Characterization of colorectal liver metastasis at single-cell resolution reveals dynamic interplay in the tumor microenvironment Sathe, A., Chen, J., Wood-Bouwens, C., Almeda, A., Lau, B., Grimes, S. M., Poultsides, G. A., Ji, H. AMER ASSOC CANCER RESEARCH. 2018
  • Improved detection and identification of microsatellite instability features in colorectal cancer: Implications for immunotherapy Shin, G., Lee, H., Grimes, S. M., Kubit, M. A., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2018
  • High-quality CNV segments from low-coverage whole genome sequencing from FFPE cancer biopsies based on an evaluation of multiple CNV tools Lee, H., Xia, L., Greer, S., Bell, J., Grimes, S. M., Bouwens, C., Shin, G., Lau, B. C., Johnson, L., Andor, N., Day, K., Miller, M., Escobar, H., Nadauld, L., Ji, H. P., Van Hummelen, P. AMER ASSOC CANCER RESEARCH. 2018
  • Loss of TP53 as a prognostic biomarker of poor survival in stage III colorectal cancer patients. Nadauld, L., Van Hummelen, P., Xia, L., Day, K., Lee, H., Bell, J., Grimes, S. M., Kubit, M., Miller, M., Shin, G., Wood, C., Greer, S., Escobar, H., Haslem, D. S., Ji, H. AMER SOC CLINICAL ONCOLOGY. 2018
  • Identification of large rearrangements in cancer genomes with barcode linked reads. Nucleic acids research Xia, L. C., Bell, J. M., Wood-Bouwens, C. n., Chen, J. J., Zhang, N. R., Ji, H. P. 2018; 46 (4): e19

    Abstract

    Large genomic rearrangements involve inversions, deletions and other structural changes that span Megabase segments of the human genome. This category of genetic aberration is the cause of many hereditary genetic disorders and contributes to pathogenesis of diseases like cancer. We developed a new algorithm called ZoomX for analysing barcode-linked sequence reads-these sequences can be traced to individual high molecular weight DNA molecules (>50 kb). To generate barcode linked sequence reads, we employ a library preparation technology (10X Genomics) that uses droplets to partition and barcode DNA molecules. Using linked read data from whole genome sequencing, we identify large genomic rearrangements, typically greater than 200kb, even when they are only present in low allelic fractions. Our algorithm uses a Poisson scan statistic to identify genomic rearrangement junctions, determine counts of junction-spanning molecules and calculate a Fisher's exact test for determining statistical significance for somatic aberrations. Utilizing a well-characterized human genome, we benchmarked this approach to accurately identify large rearrangement. Subsequently, we demonstrated that our algorithm identifies somatic rearrangements when present in lower allelic fractions as occurs in tumors. We characterized a set of complex cancer rearrangements with multiple classes of structural aberrations and with possible roles in oncogenesis.

    View details for PubMedID 29186506

  • Single Color Multiplexed ddPCR Copy Number Measurements and Single Nucleotide Variant Genotyping DIGITAL PCR: METHODS AND PROTOCOLS Wood-Bouwens, C. M., Ji, H. P., KarlinNeumann, G., Bizouarn, F. 2018; 1768: 323–33
  • Robust Multiplexed Clustering and Denoising of Digital PCR Assays by Data Gridding ANALYTICAL CHEMISTRY Lau, B. T., Wood-Bouwens, C., Ji, H. P. 2017; 89 (22): 11913–17

    Abstract

    Digital PCR (dPCR) relies on the analysis of individual partitions to accurately quantify nucleic acid species. The most widely used analysis method requires manual clustering through individual visual inspection. Some automated analysis methods have emerged but do not robustly account for multiplexed targets, low target concentration, and assay noise. In this study, we describe an open source analysis software called Calico that uses "data gridding" to increase the sensitivity of clustering toward small clusters. Our workflow also generates quality score metrics in order to gauge and filter individual assay partitions by how well they were classified. We applied our analysis algorithm to multiplexed droplet-based digital PCR data sets in both EvaGreen and probes-based schemes, and targeted the oncogenic BRAF V600E and KRAS G12D mutations. We demonstrate an automated clustering sensitivity of down to 0.1% mutant fraction and filtering of artifactual assay partitions from low quality DNA samples. Overall, we demonstrate a vastly improved approach to analyzing ddPCR data that can be applied to clinical use, where automation and reproducibility are critical.

    View details for PubMedID 29083143

  • Chromosome-scale mega-haplotypes enable digital karyotyping of cancer aneuploidy NUCLEIC ACIDS RESEARCH Bell, J. M., Lau, B. T., Greer, S. U., Wood-Bouwens, C., Xia, L. C., Connolly, I. D., Gephart, M. H., Ji, H. P. 2017; 45 (19): e162

    Abstract

    Genomic instability is a frequently occurring feature of cancer that involves large-scale structural alterations. These somatic changes in chromosome structure include duplication of entire chromosome arms and aneuploidy where chromosomes are duplicated beyond normal diploid content. However, the accurate determination of aneuploidy events in cancer genomes is a challenge. Recent advances in sequencing technology allow the characterization of haplotypes that extend megabases along the human genome using high molecular weight (HMW) DNA. For this study, we employed a library preparation method in which sequence reads have barcodes linked to single HMW DNA molecules. Barcode-linked reads are used to generate extended haplotypes on the order of megabases. We developed a method that leverages haplotypes to identify chromosomal segmental alterations in cancer and uses this information to join haplotypes together, thus extending the range of phased variants. With this approach, we identified mega-haplotypes that encompass entire chromosome arms. We characterized the chromosomal arm changes and aneuploidy events in a manner that offers similar information as a traditional karyotype but with the benefit of DNA sequence resolution. We applied this approach to characterize aneuploidy and chromosomal alterations from a series of primary colorectal cancers.

    View details for PubMedID 28977555

    View details for PubMedCentralID PMC5737808

  • Synthetic lethality screen identifies novel druggable targets in the MYC pathway Li, Y., Deutzmann, A., Bell, J., Ji, H., Felsher, D. AMER ASSOC CANCER RESEARCH. 2017
  • Single molecule counting and assessment of random molecular tagging errors with transposable giga-scale error-correcting barcodes BMC GENOMICS Lau, B. T., Ji, H. P. 2017; 18: 745

    Abstract

    RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule. To address this issue, we demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels.We experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess our method's performance, we applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, we systematically characterized the presence of sequence errors in random-mer molecular barcodes. We observed that such errors are extensive and become more dominant at low input amounts.We described the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant the use of error correcting barcodes for transcriptome analysis as input amounts decrease.

    View details for PubMedID 28934929

  • Single-Color Digital PCR Provides High-Performance Detection of Cancer Mutations from Circulating DNA. The Journal of molecular diagnostics : JMD Wood-Bouwens, C., Lau, B. T., Handy, C. M., Lee, H., Ji, H. P. 2017; 19 (5): 697-710

    Abstract

    We describe a single-color digital PCR assay that detects and quantifies cancer mutations directly from circulating DNA collected from the plasma of cancer patients. This approach relies on a double-stranded DNA intercalator dye and paired allele-specific DNA primer sets to determine an absolute count of both the mutation and wild-type-bearing DNA molecules present in the sample. The cell-free DNA assay uses an input of 1 ng of nonamplified DNA, approximately 300 genome equivalents, and has a molecular limit of detection of three mutation DNA genome-equivalent molecules per assay reaction. When using more genome equivalents as input, we demonstrated a sensitivity of 0.10% for detecting the BRAF V600E and KRAS G12D mutations. We developed several mutation assays specific to the cancer driver mutations of patients' tumors and detected these same mutations directly from the nonamplified, circulating cell-free DNA. This rapid and high-performance digital PCR assay can be configured to detect specific cancer mutations unique to an individual cancer, making it a potentially valuable method for patient-specific longitudinal monitoring.

    View details for DOI 10.1016/j.jmoldx.2017.05.003

    View details for PubMedID 28818432

  • Intestinal Enteroendocrine Lineage Cells Possess Homeostatic and Injury-Inducible Stem Cell Activity. Cell stem cell Yan, K. S., Gevaert, O., Zheng, G. X., Anchang, B., Probert, C. S., Larkin, K. A., Davies, P. S., Cheng, Z. F., Kaddis, J. S., Han, A., Roelf, K., Calderon, R. I., Cynn, E., Hu, X., Mandleywala, K., Wilhelmy, J., Grimes, S. M., Corney, D. C., Boutet, S. C., Terry, J. M., Belgrader, P., Ziraldo, S. B., Mikkelsen, T. S., Wang, F., von Furstenberg, R. J., Smith, N. R., Chandrakesan, P., May, R., Chrissy, M. A., Jain, R., Cartwright, C. A., Niland, J. C., Hong, Y. K., Carrington, J., Breault, D. T., Epstein, J., Houchen, C. W., Lynch, J. P., Martin, M. G., Plevritis, S. K., Curtis, C., Ji, H. P., Li, L., Henning, S. J., Wong, M. H., Kuo, C. J. 2017; 21 (1): 78-90.e6

    Abstract

    Several cell populations have been reported to possess intestinal stem cell (ISC) activity during homeostasis and injury-induced regeneration. Here, we explored inter-relationships between putative mouse ISC populations by comparative RNA-sequencing (RNA-seq). The transcriptomes of multiple cycling ISC populations closely resembled Lgr5+ISCs, the most well-defined ISC pool, but Bmi1-GFP+cells were distinct and enriched for enteroendocrine (EE) markers, including Prox1. Prox1-GFP+cells exhibited sustained clonogenic growth in vitro, and lineage-tracing of Prox1+cells revealed long-lived clones during homeostasis and after radiation-induced injury in vivo. Single-cell mRNA-seq revealed two subsets of Prox1-GFP+cells, one of which resembled mature EE cells while the other displayed low-level EE gene expression but co-expressed tuft cell markers, Lgr5 and Ascl2, reminiscent of label-retaining secretory progenitors. Our data suggest that the EE lineage, including mature EE cells, comprises a reservoir of homeostatic and injury-inducible ISCs, extending our understanding of cellular plasticity and stemness.

    View details for DOI 10.1016/j.stem.2017.06.014

    View details for PubMedID 28686870

    View details for PubMedCentralID PMC5642297

  • Precision Oncology Strategy in Trastuzumab-Resistant Human Epidermal Growth Factor Receptor 2-Positive Colon Cancer: Case Report of Durable Response to Ado-Trastuzumab Emtansine. JCO precision oncology Haslem, D. S., Ji, H. P., Ford, J. M., Nadauld, L. D. 2017; 1

    View details for DOI 10.1200/PO.16.00055

    View details for PubMedID 32913966

    View details for PubMedCentralID PMC7446358

  • Genomic Instability in Cancer: Teetering on the Limit of Tolerance CANCER RESEARCH Andor, N., Maley, C. C., Ji, H. P. 2017; 77 (9): 2179-2185

    Abstract

    Cancer genomic instability contributes to the phenomenon of intratumoral genetic heterogeneity, provides the genetic diversity required for natural selection, and enables the extensive phenotypic diversity that is frequently observed among patients. Genomic instability has previously been associated with poor prognosis. However, we have evidence that for solid tumors of epithelial origin, extreme levels of genomic instability, where more than 75% of the genome is subject to somatic copy number alterations, are associated with a potentially better prognosis compared with intermediate levels under this threshold. This has been observed in clonal subpopulations of larger size, especially when genomic instability is shared among a limited number of clones. We hypothesize that cancers with extreme levels of genomic instability may be teetering on the brink of a threshold where so much of their genome is adversely altered that cells rarely replicate successfully. Another possibility is that tumors with high levels of genomic instability are more immunogenic than other cancers with a less extensive burden of genetic aberrations. Regardless of the exact mechanism, but hinging on our ability to quantify how a tumor's burden of genetic aberrations is distributed among coexisting clones, genomic instability has important therapeutic implications. Herein, we explore the possibility that a high genomic instability could be the basis for a tumor's sensitivity to DNA-damaging therapies. We primarily focus on studies of epithelial-derived solid tumors. Cancer Res; 77(9); 2179-85. ©2017 AACR.

    View details for DOI 10.1158/0008-5472.CAN-16-1553

    View details for Web of Science ID 000400270100001

    View details for PubMedID 28432052

    View details for PubMedCentralID PMC5413432

  • Tandem Oligonucleotide Probe Annealing and Elongation To Discriminate Viral Sequence ANALYTICAL CHEMISTRY Taskova, M., Uhd, J., Miotke, L., Kubit, M., Bell, J., Ji, H. P., Astakhova, K. 2017; 89 (8): 4363-4366

    Abstract

    New approaches for genomic DNA/RNA detection are in high demand in order to provide controls for existing enzymatic technologies and to create alternatives for emerging applications. In particular, there is an unmet need in rapid, reliable detection of short RNA regions which could open up new opportunities in transcriptome analysis, virology, and other fields. Herein, we report for the first time a "click" chemistry approach to oligonucleotide probe elongation as a novel approach to specifically detect a viral sequence. We hybridized a library of short, terminally labeled probes to Ebola virus RNA followed by click assembly and analysis of the read sequence by various techniques. As we demonstrate in this paper, using our new approach, a viral RNA sequence can be detected in less than 2 h without the need for cDNA synthesis or any other enzymatic reactions and with a sensitivity of <10 pM target RNA.

    View details for DOI 10.1021/acs.analchem.7b00646

    View details for Web of Science ID 000399858800008

    View details for PubMedID 28382823

  • A Targeted Resequencing Approach to Identify Actionable Somatic Copy Number Alterations with High Sensitivity Alongside SNVs and Indels from Clinical Tumor Specimens De La Vega, F. M., Mendoza, D., Bouhlai, Y., Vilborg, A., Koehler, R., Pouliot, Y., Irvine, S., Trig, L., Goodsaid, F., Ji, H. P. ELSEVIER SCIENCE INC. 2017: S48
  • CRISPR-Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis NATURE COMMUNICATIONS Shin, G., Grimes, S. M., Lee, H., Lau, B. T., Xia, L. C., Ji, H. P. 2017; 8

    Abstract

    Microsatellites are multi-allelic and composed of short tandem repeats (STRs) with individual motifs composed of mononucleotides, dinucleotides or higher including hexamers. Next-generation sequencing approaches and other STR assays rely on a limited number of PCR amplicons, typically in the tens. Here, we demonstrate STR-Seq, a next-generation sequencing technology that analyses over 2,000 STRs in parallel, and provides the accurate genotyping of microsatellites. STR-Seq employs in vitro CRISPR-Cas9-targeted fragmentation to produce specific DNA molecules covering the complete microsatellite sequence. Amplification-free library preparation provides single molecule sequences without unique molecular barcodes. STR-selective primers enable massively parallel, targeted sequencing of large STR sets. Overall, STR-Seq has higher throughput, improved accuracy and provides a greater number of informative haplotypes compared with other microsatellite analysis approaches. With these new features, STR-Seq can identify a 0.1% minor genome fraction in a DNA mixture composed of different, unrelated samples.

    View details for DOI 10.1038/ncomms14291

    View details for PubMedID 28169275

  • Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases. Genome medicine Greer, S. U., Nadauld, L. D., Lau, B. T., Chen, J. n., Wood-Bouwens, C. n., Ford, J. M., Kuo, C. J., Ji, H. P. 2017; 9 (1): 57

    Abstract

    Genome rearrangements are critical oncogenic driver events in many malignancies. However, the identification and resolution of the structure of cancer genomic rearrangements remain challenging even with whole genome sequencing.To identify oncogenic genomic rearrangements and resolve their structure, we analyzed linked read sequencing. This approach relies on a microfluidic droplet technology to produce libraries derived from single, high molecular weight DNA molecules, 50 kb in size or greater. After sequencing, the barcoded sequence reads provide long range genomic information, identify individual high molecular weight DNA molecules, determine the haplotype context of genetic variants that occur across contiguous megabase-length segments of the genome and delineate the structure of complex rearrangements. We applied linked read sequencing of whole genomes to the analysis of a set of synchronous metastatic diffuse gastric cancers that occurred in the same individual.When comparing metastatic sites, our analysis implicated a complex somatic rearrangement that was present in the metastatic tumor. The oncogenic event associated with the identified complex rearrangement resulted in an amplification of the known cancer driver gene FGFR2. With further investigation using these linked read data, the FGFR2 copy number alteration was determined to be a deletion-inversion motif that underwent tandem duplication, with unique breakpoints in each metastasis. Using a three-dimensional organoid tissue model, we functionally validated the metastatic potential of an FGFR2 amplification in gastric cancer.Our study demonstrates that linked read sequencing is useful in characterizing oncogenic rearrangements in cancer metastasis.

    View details for PubMedID 28629429

  • Precision Oncology Strategy in Trastuzumab-Resistant Human Epidermal Growth Factor Receptor 2-Positive Colon Cancer: Case Report of Durable Response to Ado-Trastuzumab Emtansine JCO PRECISION ONCOLOGY Haslem, D. S., Ji, H. P., Ford, J. M., Nadauld, L. D. 2017; 1
  • Intestinal Enteroendocrine Lineage Cells Possess Homeostatic and Injury-Inducible Stem Cell Activity Cell Stem Cell Yan, K., Gevaert, O., Zheng, G., Anchang, B., Probert, C., et al 2017; 21 (1): 78 - 90.e6

    Abstract

    Several cell populations have been reported to possess intestinal stem cell (ISC) activity during homeostasis and injury-induced regeneration. Here, we explored inter-relationships between putative mouse ISC populations by comparative RNA-sequencing (RNA-seq). The transcriptomes of multiple cycling ISC populations closely resembled Lgr5+ISCs, the most well-defined ISC pool, but Bmi1-GFP+cells were distinct and enriched for enteroendocrine (EE) markers, including Prox1. Prox1-GFP+cells exhibited sustained clonogenic growth in vitro, and lineage-tracing of Prox1+cells revealed long-lived clones during homeostasis and after radiation-induced injury in vivo. Single-cell mRNA-seq revealed two subsets of Prox1-GFP+cells, one of which resembled mature EE cells while the other displayed low-level EE gene expression but co-expressed tuft cell markers, Lgr5 and Ascl2, reminiscent of label-retaining secretory progenitors. Our data suggest that the EE lineage, including mature EE cells, comprises a reservoir of homeostatic and injury-inducible ISCs, extending our understanding of cellular plasticity and stemness.

    View details for DOI 10.1016/j.stem.2017.06.014

    View details for PubMedCentralID PMC5642297

  • Massively Parallel Single Cell RNA-Seq of Primary Lymphomas Reveals Distinct Cellular Lineages and Diverse, Intratumoral Transcriptional States Andor, N., Simonds, E., Chen, J., Grimes, S., Wood, C., Czerwinski, D. K., Handy, C., Levy, R., Ji, H. P. AMER SOC HEMATOLOGY. 2016
  • A genome-wide approach for detecting novel insertion-deletion variants of mid-range size. Nucleic acids research Xia, L. C., Sakshuwong, S., Hopmans, E. S., Bell, J. M., Grimes, S. M., Siegmund, D. O., Ji, H. P., Zhang, N. R. 2016; 44 (15)

    Abstract

    We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.

    View details for DOI 10.1093/nar/gkw481

    View details for PubMedID 27325742

    View details for PubMedCentralID PMC5009736

  • Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nature biotechnology Zheng, G. X., Lau, B. T., Schnall-Levin, M., Jarosz, M., Bell, J. M., Hindson, C. M., Kyriazopoulou-Panagiotopoulou, S., Masquelier, D. A., Merrill, L., Terry, J. M., Mudivarti, P. A., Wyatt, P. W., Bharadwaj, R., Makarewicz, A. J., Li, Y., Belgrader, P., Price, A. D., Lowe, A. J., Marks, P., Vurens, G. M., Hardenbol, P., Montesclaros, L., Luo, M., Greenfield, L., Wong, A., Birch, D. E., Short, S. W., Bjornson, K. P., Patel, P., Hopmans, E. S., Wood, C., Kaur, S., Lockwood, G. K., Stafford, D., Delaney, J. P., Wu, I., Ordonez, H. S., Grimes, S. M., Greer, S., Lee, J. Y., Belhocine, K., Giorda, K. M., Heaton, W. H., McDermott, G. P., Bent, Z. W., Meschi, F., Kondov, N. O., Wilson, R., Bernate, J. A., Gauby, S., Kindwall, A., Bermejo, C., Fehr, A. N., Chan, A., Saxonov, S., Ness, K. D., Hindson, B. J., Ji, H. P. 2016; 34 (3): 303-311

    Abstract

    Haplotyping of human chromosomes is a prerequisite for cataloguing the full repertoire of genetic variation. We present a microfluidics-based, linked-read sequencing technology that can phase and haplotype germline and cancer genomes using nanograms of input DNA. This high-throughput platform prepares barcoded libraries for short-read sequencing and computationally reconstructs long-range haplotype and structural variant information. We generate haplotype blocks in a nuclear trio that are concordant with expected inheritance patterns and phase a set of structural variants. We also resolve the structure of the EML4-ALK gene fusion in the NCI-H2228 cancer cell line using phased exome sequencing. Finally, we assign genetic aberrations to specific megabase-scale haplotypes generated from whole-genome sequencing of a primary colorectal adenocarcinoma. This approach resolves haplotype information using up to 100 times less genomic DNA than some methods and enables the accurate detection of structural variants.

    View details for DOI 10.1038/nbt.3432

    View details for PubMedID 26829319

    View details for PubMedCentralID PMC4786454

  • Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nature medicine Andor, N., Graham, T. A., Jansen, M., Xia, L. C., Aktipis, C. A., Petritsch, C., Ji, H. P., Maley, C. C. 2016; 22 (1): 105-113

    Abstract

    Intratumor heterogeneity (ITH) drives neoplastic progression and therapeutic resistance. We used the bioinformatics tools 'expanding ploidy and allele frequency on nested subpopulations' (EXPANDS) and PyClone to detect clones that are present at a ≥10% frequency in 1,165 exome sequences from tumors in The Cancer Genome Atlas. 86% of tumors across 12 cancer types had at least two clones. ITH in the morphology of nuclei was associated with genetic ITH (Spearman's correlation coefficient, ρ = 0.24-0.41; P < 0.001). Mutation of a driver gene that typically appears in smaller clones was a survival risk factor (hazard ratio (HR) = 2.15, 95% confidence interval (CI): 1.71-2.69). The risk of mortality also increased when >2 clones coexisted in the same tumor sample (HR = 1.49, 95% CI: 1.20-1.87). In two independent data sets, copy-number alterations affecting either <25% or >75% of a tumor's genome predicted reduced risk (HR = 0.15, 95% CI: 0.08-0.29). Mortality risk also declined when >4 clones coexisted in the sample, suggesting a trade-off between the costs and benefits of genomic instability. ITH and genomic instability thus have the potential to be useful measures that can universally be applied to all cancers.

    View details for DOI 10.1038/nm.3984

    View details for PubMedID 26618723

  • Pan-cancer analysis of the etiology and consequences of intra-tumor heterogeneity Andor, N., Graham, T. A., Petritsch, C., Ji, H. P., Maley, C. C. AMER ASSOC CANCER RESEARCH. 2015
  • Pan-cancer analysis of the etiology and consequences of intratumor heterogeneity Andor, N., Graham, T. A., Petritsch, C., Ji, H. P., Maley, C. C. AMER ASSOC CANCER RESEARCH. 2015
  • The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations GENOME MEDICINE Lee, H., Palm, J., Grimes, S. M., Ji, H. P. 2015; 7

    Abstract

    The Cancer Genome Atlas (TCGA) project has generated genomic data sets covering over 20 malignancies. These data provide valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the relationship among TCGA genomic results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. Overcoming this hurdle is an important step toward the wider clinical translation of cancer genomic/proteomic data and implementation of precision cancer medicine. Several websites such as the cBio portal or University of California Santa Cruz genome browser make TCGA data accessible but lack interactive features for querying clinically relevant phenotypic associations with cancer drivers. To enable exploration of the clinical-genomic driver associations from TCGA data, we developed the Cancer Genome Atlas Clinical Explorer.The Cancer Genome Atlas Clinical Explorer interface provides a straightforward platform to query TCGA data using one of the following methods: (1) searching for clinically relevant genes, micro RNAs, and proteins by name, cancer types, or clinical parameters; (2) searching for genomic/proteomic profile changes by clinical parameters in a cancer type; or (3) testing two-hit hypotheses. SQL queries run in the background and results are displayed on our portal in an easy-to-navigate interface according to user's input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression and clinical parameters in the space of multiple genomic/proteomic features provided by TCGA data. Moreover, we identified and ranked gene/micro RNA/protein predictors of each clinical parameter for each cancer. The robustness of the results was estimated by bootstrapping. Overall, we identify associations of potential clinical relevance among genes/micro RNAs/proteins using our statistical analysis from 25 cancer types and 18 clinical parameters that include clinical stage or smoking history.The Cancer Genome Atlas Clinical Explorer enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible web and mobile interface, users can examine queries and test hypothesis regarding genomic/proteomic alterations across a broad spectrum of malignancies.

    View details for DOI 10.1186/s13073-015-0226-3

    View details for Web of Science ID 000363619100002

    View details for PubMedID 26507825

    View details for PubMedCentralID PMC4624593

  • Enzyme-Free Detection of Mutations in Cancer DNA Using Synthetic Oligonucleotide Probes and Fluorescence Microscopy PLOS ONE Miotke, L., Maity, A., Ji, H., Brewer, J., Astakhova, K. 2015; 10 (8)

    Abstract

    Rapid reliable diagnostics of DNA mutations are highly desirable in research and clinical assays. Current development in this field goes simultaneously in two directions: 1) high-throughput methods, and 2) portable assays. Non-enzymatic approaches are attractive for both types of methods since they would allow rapid and relatively inexpensive detection of nucleic acids. Modern fluorescence microscopy is having a huge impact on detection of biomolecules at previously unachievable resolution. However, no straightforward methods to detect DNA in a non-enzymatic way using fluorescence microscopy and nucleic acid analogues have been proposed so far.Here we report a novel enzyme-free approach to efficiently detect cancer mutations. This assay includes gene-specific target enrichment followed by annealing to oligonucleotides containing locked nucleic acids (LNAs) and finally, detection by fluorescence microscopy. The LNA containing probes display high binding affinity and specificity to DNA containing mutations, which allows for the detection of mutation abundance with an intercalating EvaGreen dye. We used a second probe, which increases the overall number of base pairs in order to produce a higher fluorescence signal by incorporating more dye molecules. Indeed we show here that using EvaGreen dye and LNA probes, genomic DNA containing BRAF V600E mutation could be detected by fluorescence microscopy at low femtomolar concentrations. Notably, this was at least 1000-fold above the potential detection limit.Overall, the novel assay we describe could become a new approach to rapid, reliable and enzyme-free diagnostics of cancer or other associated DNA targets. Importantly, stoichiometry of wild type and mutant targets is conserved in our assay, which allows for an accurate estimation of mutant abundance when the detection limit requirement is met. Using fluorescence microscopy, this approach presents the opportunity to detect DNA at single-molecule resolution and directly in the biological sample of choice.

    View details for DOI 10.1371/journal.pone.0136720

    View details for Web of Science ID 000360144000090

    View details for PubMedCentralID PMC4552304

  • A new multiple feature approach for rapid and highly accurate somatic structural variation discovery from whole cancer genome sequencing Xia, L. C., Bell, J., Chen, J., Zhang, N. R., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2015
  • Identification of novel tumor suppressor candidates and characterizing their potential driver role in familial cholangiocarcinoma Greer, S., Nadauld, L. D., Lau, B., Miotke, L., Hopmans, E., Wood, C. M., Bell, J. M., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2015
  • Megabase-scale phased haplotypes of genetic aberrations from whole cancer genome sequencing of primary colorectal tumors Lau, B., Bell, J. M., Schnall-Levin, M., Jarosz, M., Hopmans, E., Wood, C. M., Zheng, G. X., Giorda, K., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2015
  • Clonal structure analysis of cancer genomes at single molecule resolution Lau, B., Ji, H. AMER ASSOC CANCER RESEARCH. 2015
  • Pan-cancer analysis of the causes and consequences of Intra-tumor heterogeneity Andor, N., Graham, T. A., Aktipis, A. C., Petritsch, C., Ji, H. P., Maley, C. C. AMER ASSOC CANCER RESEARCH. 2015
  • Allele-specific copy number profiling by next-generation DNA sequencing. Nucleic acids research Chen, H., Bell, J. M., Zavala, N. A., Ji, H. P., Zhang, N. R. 2015; 43 (4)

    Abstract

    The progression and clonal development of tumors often involve amplifications and deletions of genomic DNA. Estimation of allele-specific copy number, which quantifies the number of copies of each allele at each variant loci rather than the total number of chromosome copies, is an important step in the characterization of tumor genomes and the inference of their clonal history. We describe a new method, falcon, for finding somatic allele-specific copy number changes by next generation sequencing of tumors with matched normals. falcon is based on a change-point model on a bivariate mixed Binomial process, which explicitly models the copy numbers of the two chromosome haplotypes and corrects for local allele-specific coverage biases. By using the Binomial distribution rather than a normal approximation, falcon more effectively pools evidence from sites with low coverage. A modified Bayesian information criterion is used to guide model selection for determining the number of copy number events. Falcon is evaluated on in silico spike-in data and applied to the analysis of a pre-malignant colon tumor sample and late-stage colorectal adenocarcinoma from the same individual. The allele-specific copy number estimates obtained by falcon allows us to draw detailed conclusions regarding the clonal history of the individual's colon cancer.

    View details for DOI 10.1093/nar/gku1252

    View details for PubMedID 25477383

  • The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations. Genome medicine Lee, H., Palm, J., Grimes, S. M., Ji, H. P. 2015; 7 (1): 112-?

    Abstract

    The Cancer Genome Atlas (TCGA) project has generated genomic data sets covering over 20 malignancies. These data provide valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the relationship among TCGA genomic results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. Overcoming this hurdle is an important step toward the wider clinical translation of cancer genomic/proteomic data and implementation of precision cancer medicine. Several websites such as the cBio portal or University of California Santa Cruz genome browser make TCGA data accessible but lack interactive features for querying clinically relevant phenotypic associations with cancer drivers. To enable exploration of the clinical-genomic driver associations from TCGA data, we developed the Cancer Genome Atlas Clinical Explorer.The Cancer Genome Atlas Clinical Explorer interface provides a straightforward platform to query TCGA data using one of the following methods: (1) searching for clinically relevant genes, micro RNAs, and proteins by name, cancer types, or clinical parameters; (2) searching for genomic/proteomic profile changes by clinical parameters in a cancer type; or (3) testing two-hit hypotheses. SQL queries run in the background and results are displayed on our portal in an easy-to-navigate interface according to user's input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression and clinical parameters in the space of multiple genomic/proteomic features provided by TCGA data. Moreover, we identified and ranked gene/micro RNA/protein predictors of each clinical parameter for each cancer. The robustness of the results was estimated by bootstrapping. Overall, we identify associations of potential clinical relevance among genes/micro RNAs/proteins using our statistical analysis from 25 cancer types and 18 clinical parameters that include clinical stage or smoking history.The Cancer Genome Atlas Clinical Explorer enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible web and mobile interface, users can examine queries and test hypothesis regarding genomic/proteomic alterations across a broad spectrum of malignancies.

    View details for DOI 10.1186/s13073-015-0226-3

    View details for PubMedID 26507825

  • Enzyme-Free Detection of Mutations in Cancer DNA Using Synthetic Oligonucleotide Probes and Fluorescence Microscopy. PloS one Miotke, L., Maity, A., Ji, H., Brewer, J., Astakhova, K. 2015; 10 (8)

    Abstract

    Rapid reliable diagnostics of DNA mutations are highly desirable in research and clinical assays. Current development in this field goes simultaneously in two directions: 1) high-throughput methods, and 2) portable assays. Non-enzymatic approaches are attractive for both types of methods since they would allow rapid and relatively inexpensive detection of nucleic acids. Modern fluorescence microscopy is having a huge impact on detection of biomolecules at previously unachievable resolution. However, no straightforward methods to detect DNA in a non-enzymatic way using fluorescence microscopy and nucleic acid analogues have been proposed so far.Here we report a novel enzyme-free approach to efficiently detect cancer mutations. This assay includes gene-specific target enrichment followed by annealing to oligonucleotides containing locked nucleic acids (LNAs) and finally, detection by fluorescence microscopy. The LNA containing probes display high binding affinity and specificity to DNA containing mutations, which allows for the detection of mutation abundance with an intercalating EvaGreen dye. We used a second probe, which increases the overall number of base pairs in order to produce a higher fluorescence signal by incorporating more dye molecules. Indeed we show here that using EvaGreen dye and LNA probes, genomic DNA containing BRAF V600E mutation could be detected by fluorescence microscopy at low femtomolar concentrations. Notably, this was at least 1000-fold above the potential detection limit.Overall, the novel assay we describe could become a new approach to rapid, reliable and enzyme-free diagnostics of cancer or other associated DNA targets. Importantly, stoichiometry of wild type and mutant targets is conserved in our assay, which allows for an accurate estimation of mutant abundance when the detection limit requirement is met. Using fluorescence microscopy, this approach presents the opportunity to detect DNA at single-molecule resolution and directly in the biological sample of choice.

    View details for DOI 10.1371/journal.pone.0136720

    View details for PubMedID 26312489

  • Emergence of Hemagglutinin Mutations During the Course of Influenza Infection. Scientific reports Cushing, A., Kamali, A., Winters, M., Hopmans, E. S., Bell, J. M., Grimes, S. M., Xia, L. C., Zhang, N. R., Moss, R. B., Holodniy, M., Ji, H. P. 2015; 5: 16178-?

    Abstract

    Influenza remains a significant cause of disease mortality. The ongoing threat of influenza infection is partly attributable to the emergence of new mutations in the influenza genome. Among the influenza viral gene products, the hemagglutinin (HA) glycoprotein plays a critical role in influenza pathogenesis, is the target for vaccines and accumulates new mutations that may alter the efficacy of immunization. To study the emergence of HA mutations during the course of infection, we employed a deep-targeted sequencing method. We used samples from 17 patients with active H1N1 or H3N2 influenza infections. These patients were not treated with antivirals. In addition, we had samples from five patients who were analyzed longitudinally. Thus, we determined the quantitative changes in the fractional representation of HA mutations during the course of infection. Across individuals in the study, a series of novel HA mutations directly altered the HA coding sequence were identified. Serial viral sampling revealed HA mutations that either were stable, expanded or were reduced in representation during the course of the infection. Overall, we demonstrated the emergence of unique mutations specific to an infected individual and temporal genetic variation during infection.

    View details for DOI 10.1038/srep16178

    View details for PubMedID 26538451

  • Single-Color, Multiplexed, Droplet Digital PCR Analysis of the Clinical Significance of Hemizygous Loss of WRN Gene in Colorectal Cancer Lee, H., Lau, B., Zavala, N. A., Ji, H. P. ELSEVIER SCIENCE INC. 2014: 768
  • A robust and rapid targeted sequencing technology for iterative multiple genomic features in cancer Lau, B., Cushing, A., Ji, H. AMER ASSOC CANCER RESEARCH. 2014
  • Highly sensitive and specific digital quantification of cancer genetic aberrations Miotke, L. K., Lau, B., Rumma, R., Ji, H. AMER ASSOC CANCER RESEARCH. 2014
  • Oncogenic transformation of diverse gastrointestinal tissues in primary organoid culture NATURE MEDICINE Li, X., Nadauld, L., Ootani, A., Corney, D. C., Pai, R. K., Gevaert, O., Cantrell, M. A., Rack, P. G., Neal, J. T., Chan, C. W., Yeung, T., Gong, X., Yuan, J., Wilhelmy, J., Robine, S., Attardi, L. D., Plevritis, S. K., Hung, K. E., Chen, C., Ji, H. P., Kuo, C. J. 2014; 20 (7): 769-777

    Abstract

    The application of primary organoid cultures containing epithelial and mesenchymal elements to cancer modeling holds promise for combining the accurate multilineage differentiation and physiology of in vivo systems with the facile in vitro manipulation of transformed cell lines. Here we used a single air-liquid interface culture method without modification to engineer oncogenic mutations into primary epithelial and mesenchymal organoids from mouse colon, stomach and pancreas. Pancreatic and gastric organoids exhibited dysplasia as a result of expression of Kras carrying the G12D mutation (Kras(G12D)), p53 loss or both and readily generated adenocarcinoma after in vivo transplantation. In contrast, primary colon organoids required combinatorial Apc, p53, Kras(G12D) and Smad4 mutations for progressive transformation to invasive adenocarcinoma-like histology in vitro and tumorigenicity in vivo, recapitulating multi-hit models of colorectal cancer (CRC), as compared to the more promiscuous transformation of small intestinal organoids. Colon organoid culture functionally validated the microRNA miR-483 as a dominant driver oncogene at the IGF2 (insulin-like growth factor-2) 11p15.5 CRC amplicon, inducing dysplasia in vitro and tumorigenicity in vivo. These studies demonstrate the general utility of a highly tractable primary organoid system for cancer modeling and driver oncogene validation in diverse gastrointestinal tissues.

    View details for DOI 10.1038/nm.3585

    View details for Web of Science ID 000338689500021

  • A programmable method for massively parallel targeted sequencing. Nucleic acids research Hopmans, E. S., Natsoulis, G., Bell, J. M., Grimes, S. M., Sieh, W., Ji, H. P. 2014; 42 (10)

    Abstract

    We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy.

    View details for DOI 10.1093/nar/gku282

    View details for PubMedID 24782526

  • Oncogenic transformation of diverse gastrointestinal tissues in primary organoid culture. Nature medicine Li, X., Nadauld, L., Ootani, A., Corney, D. C., Pai, R. K., Gevaert, O., Cantrell, M. A., Rack, P. G., Neal, J. T., Chan, C. W., Yeung, T., Gong, X., Yuan, J., Wilhelmy, J., Robine, S., Attardi, L. D., Plevritis, S. K., Hung, K. E., Chen, C. Z., Ji, H. P., Kuo, C. J. 2014

    Abstract

    The application of primary organoid cultures containing epithelial and mesenchymal elements to cancer modeling holds promise for combining the accurate multilineage differentiation and physiology of in vivo systems with the facile in vitro manipulation of transformed cell lines. Here we used a single air-liquid interface culture method without modification to engineer oncogenic mutations into primary epithelial and mesenchymal organoids from mouse colon, stomach and pancreas. Pancreatic and gastric organoids exhibited dysplasia as a result of expression of Kras carrying the G12D mutation (Kras(G12D)), p53 loss or both and readily generated adenocarcinoma after in vivo transplantation. In contrast, primary colon organoids required combinatorial Apc, p53, Kras(G12D) and Smad4 mutations for progressive transformation to invasive adenocarcinoma-like histology in vitro and tumorigenicity in vivo, recapitulating multi-hit models of colorectal cancer (CRC), as compared to the more promiscuous transformation of small intestinal organoids. Colon organoid culture functionally validated the microRNA miR-483 as a dominant driver oncogene at the IGF2 (insulin-like growth factor-2) 11p15.5 CRC amplicon, inducing dysplasia in vitro and tumorigenicity in vivo. These studies demonstrate the general utility of a highly tractable primary organoid system for cancer modeling and driver oncogene validation in diverse gastrointestinal tissues.

    View details for DOI 10.1038/nm.3585

    View details for PubMedID 24859528

  • High sensitivity detection and quantitation of DNA copy number and single nucleotide variants with single color droplet digital PCR. Analytical chemistry Miotke, L., Lau, B. T., Rumma, R. T., Ji, H. P. 2014; 86 (5): 2618-2624

    Abstract

    In this study, we present a highly customizable method for quantifying copy number and point mutations utilizing a single-color, droplet digital PCR platform. Droplet digital polymerase chain reaction (ddPCR) is rapidly replacing real-time quantitative PCR (qRT-PCR) as an efficient method of independent DNA quantification. Compared to quantative PCR, ddPCR eliminates the needs for traditional standards; instead, it measures target and reference DNA within the same well. The applications for ddPCR are widespread including targeted quantitation of genetic aberrations, which is commonly achieved with a two-color fluorescent oligonucleotide probe (TaqMan) design. However, the overall cost and need for optimization can be greatly reduced with an alternative method of distinguishing between target and reference products using the nonspecific DNA binding properties of EvaGreen (EG) dye. By manipulating the length of the target and reference amplicons, we can distinguish between their fluorescent signals and quantify each independently. We demonstrate the effectiveness of this method by examining copy number in the proto-oncogene FLT3 and the common V600E point mutation in BRAF. Using a series of well-characterized control samples and cancer cell lines, we confirmed the accuracy of our method in quantifying mutation percentage and integer value copy number changes. As another novel feature, our assay was able to detect a mutation comprising less than 1% of an otherwise wild-type sample, as well as copy number changes from cancers even in the context of significant dilution with normal DNA. This flexible and cost-effective method of independent DNA quantification proves to be a robust alternative to the commercialized TaqMan assay.

    View details for DOI 10.1021/ac403843j

    View details for PubMedID 24483992

  • A phase II study of capecitabine, carboplatin, and bevacizumab for metastatic or unresectable gastroesophageal junction and gastric adenocarcinoma. Kunz, P. L., Nandoskar, P., Koontz, M., Ji, H., Ford, J. M., Balise, R. R., Kamaya, A., Rubin, D., Fisher, G. A. AMER SOC CLINICAL ONCOLOGY. 2014
  • Metastatic tumor evolution and organoid modeling implicate TGFBR2 as a cancer driver in diffuse gastric cancer GENOME BIOLOGY Nadauld, L. D., Garcia, S., Natsoulis, G., Bell, J. M., Miotke, L., Hopmans, E. S., Xu, H., Pai, R. K., Palm, C., Regan, J. F., Chen, H., Flaherty, P., Ootani, A., Zhang, N. R., Ford, J. M., Kuo, C. J., Ji, H. P. 2014; 15 (8)

    Abstract

    Gastric cancer is the second-leading cause of global cancer deaths, with metastatic disease representing the primary cause of mortality. To identify candidate drivers involved in oncogenesis and tumor evolution, we conduct an extensive genome sequencing analysis of metastatic progression in a diffuse gastric cancer. This involves a comparison between a primary tumor from a hereditary diffuse gastric cancer syndrome proband and its recurrence as an ovarian metastasis.Both the primary tumor and ovarian metastasis have common biallelic loss-of-function of both the CDH1 and TP53 tumor suppressors, indicating a common genetic origin. While the primary tumor exhibits amplification of the Fibroblast growth factor receptor 2 (FGFR2) gene, the metastasis notably lacks FGFR2 amplification but rather possesses unique biallelic alterations of Transforming growth factor-beta receptor 2 (TGFBR2), indicating the divergent in vivo evolution of a TGFBR2-mutant metastatic clonal population in this patient. As TGFBR2 mutations have not previously been functionally validated in gastric cancer, we modeled the metastatic potential of TGFBR2 loss in a murine three-dimensional primary gastric organoid culture. The Tgfbr2 shRNA knockdown within Cdh1-/-; Tp53-/- organoids generates invasion in vitro and robust metastatic tumorigenicity in vivo, confirming Tgfbr2 metastasis suppressor activity.We document the metastatic differentiation and genetic heterogeneity of diffuse gastric cancer and reveal the potential metastatic role of TGFBR2 loss-of-function. In support of this study, we apply a murine primary organoid culture method capable of recapitulating in vivo metastatic gastric cancer. Overall, we describe an integrated approach to identify and functionally validate putative cancer drivers involved in metastasis.

    View details for DOI 10.1186/s13059-014-0428-9

    View details for Web of Science ID 000346604100009

    View details for PubMedID 25315765

    View details for PubMedCentralID PMC4145231

  • MendeLIMS: a web-based laboratory information management system for clinical genome sequencing. BMC bioinformatics Grimes, S. M., Ji, H. P. 2014; 15 (1): 290-?

    View details for DOI 10.1186/1471-2105-15-290

    View details for PubMedID 25159034

  • Identification of Insertion Deletion Mutations from Deep Targeted Resequencing. Journal of data mining in genomics & proteomics Natsoulis, G., Zhang, N., Welch, K., Bell, J., Ji, H. P. 2013; 4 (3)

    Abstract

    Taking advantage of the deep targeted sequencing capabilities of next generation sequencers, we have developed a novel two step insertion deletion (indel) detection algorithm (IDA) that can determine indels from single read sequences with high computational efficiency and sensitivity when indels are fractionally less compared to wild type reference sequence. First, it identifies candidate indel positions utilizing specific sequence alignment artifacts produced by rapid alignment programs. Second, it confirms the location of the candidate indel by using the Smith-Waterman (SW) algorithm on a restricted subset of Sequence reads. We demonstrate that IDA is applicable to indels of varying sizes from deep targeted sequencing data at low fractions where the indel is diluted by wild type sequence. Our algorithm is useful in detecting indel variants present at variable allelic frequencies such as may occur in heterozygotes and mixed normal-tumor tissue.

    View details for PubMedID 24511426

  • Systematic genomic identification of colorectal cancer genes delineating advanced from early clinical stage and metastasis. BMC medical genomics Lee, H., Flaherty, P., Ji, H. P. 2013; 6: 54-?

    Abstract

    Colorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis.We compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities.A fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis.We conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression.

    View details for DOI 10.1186/1755-8794-6-54

    View details for PubMedID 24308539

  • RVD: a command-line program for ultrasensitive rare single nucleotide variant detection using targeted next-generation DNA resequencing. BMC research notes Cushing, A., Flaherty, P., Hopmans, E., Bell, J. M., Ji, H. P. 2013; 6: 206-?

    Abstract

    Rare single nucleotide variants play an important role in genetic diversity and heterogeneity of specific human disease. For example, an individual clinical sample can harbor rare mutations at minor frequencies. Genetic diversity within an individual clinical sample is oftentimes reflected in rare mutations. Therefore, detecting rare variants prior to treatment may prove to be a useful predictor for therapeutic response. Current rare variant detection algorithms using next generation DNA sequencing are limited by inherent sequencing error rate and platform availability.Here we describe an optimized implementation of a rare variant detection algorithm called RVD for use in targeted gene resequencing. RVD is available both as a command-line program and for use in MATLAB and estimates context-specific error using a beta-binomial model to call variants with minor allele frequency (MAF) as low as 0.1%. We show that RVD accepts standard BAM formatted sequence files. We tested RVD analysis on multiple Illumina sequencing platforms, among the most widely used DNA sequencing platforms.RVD meets a growing need for highly sensitive and specific tools for variant detection. To demonstrate the usefulness of RVD, we carried out a thorough analysis of the software's performance on synthetic and clinical virus samples sequenced on both an Illumina GAIIx and a MiSeq. We expect RVD can improve understanding the genetics and treatment of common viral diseases including influenza. RVD is available at the following URL:http://dna-discovery.stanford.edu/software/rvd/.

    View details for DOI 10.1186/1756-0500-6-206

    View details for PubMedID 23701658

    View details for PubMedCentralID PMC3695852

  • DETECTING MUTATIONS IN MIXED SAMPLE SEQUENCING DATA USING EMPIRICAL BAYES ANNALS OF APPLIED STATISTICS Muralidharan, O., Natsoulis, G., Bell, J., Ji, H., Zhang, N. R. 2012; 6 (3): 1047-1067

    View details for DOI 10.1214/12-AOAS538

    View details for Web of Science ID 000314457400010

  • Identification of a novel deletion mutant strain in Saccharomyces cerevisiae that results in a microsatellite instability phenotype. BioDiscovery Ji, H. P., Morales, S., Welch, K., Yuen, C., Farnam, K., Ford, J. M. 2012

    Abstract

    The DNA mismatch repair (MMR) pathway corrects specific types of DNA replication errors that affect microsatellites and thus is critical for maintaining genomic integrity. The genes of the MMR pathway are highly conserved across different organisms. Likewise, defective MMR function universally results in microsatellite instability (MSI) which is a hallmark of certain types of cancer associated with the Mendelian disorder hereditary nonpolyposis colorectal cancer. (Lynch syndrome). To identify previously unrecognized deleted genes or loci that can lead to MSI, we developed a functional genomics screen utilizing a plasmid containing a microsatellite sequence that is a host spot for MSI mutations and the comprehensive homozygous diploid deletion mutant resource for Saccharomyces cerevisiae. This pool represents a collection of non-essential homozygous yeast diploid (2N) mutants in which there are deletions for over four thousand yeast open reading frames (ORFs). From our screen, we identified a deletion mutant strain of the PAU24 gene that leads to MSI. In a series of validation experiments, we determined that this PAU24 mutant strain had an increased MSI-specific mutation rate in comparison to the original background wildtype strain, other deletion mutants and comparable to a MMR mutant involving the MLH1 gene. Likewise, in yeast strains with a deletion of PAU24, we identified specific de novo indel mutations that occurred within the targeted microsatellite used for this screen.

    View details for PubMedID 23667739

  • Improving bioinformatic pipelines for exome variant calling GENOME MEDICINE Ji, H. P. 2012; 4

    Abstract

    Exome sequencing analysis is a cost-effective approach for identifying variants in coding regions. However, recognizing the relevant single nucleotide variants, small insertions and deletions remains a challenge for many researchers and diagnostic laboratories typically do not have access to the bioinformatic analysis pipelines necessary for clinical application. The Atlas2 suite, recently released by Baylor Genome Center, is designed to be widely accessible, runs on desktop computers but is scalable to computational clusters, and performs comparably with other popular variant callers. Atlas2 may be an accessible alternative for data processing when a rapid solution for variant calling is required.See research article http://www.biomedcentral.com/1471-2105/13/8.

    View details for DOI 10.1186/gm306

    View details for Web of Science ID 000314564600001

    View details for PubMedID 22289516

    View details for PubMedCentralID PMC3334555

  • The Human OligoGenome Resource: a database of oligonucleotide capture probes for resequencing target regions across the human genome. Nucleic acids research Newburger, D. E., Natsoulis, G., Grimes, S., Bell, J. M., Davis, R. W., Batzoglou, S., Ji, H. P. 2012; 40 (Database issue): D1137-43

    Abstract

    Recent exponential growth in the throughput of next-generation DNA sequencing platforms has dramatically spurred the use of accessible and scalable targeted resequencing approaches. This includes candidate region diagnostic resequencing and novel variant validation from whole genome or exome sequencing analysis. We have previously demonstrated that selective genomic circularization is a robust in-solution approach for capturing and resequencing thousands of target human genome loci such as exons and regulatory sequences. To facilitate the design and production of customized capture assays for any given region in the human genome, we developed the Human OligoGenome Resource (http://oligogenome.stanford.edu/). This online database contains over 21 million capture oligonucleotide sequences. It enables one to create customized and highly multiplexed resequencing assays of target regions across the human genome and is not restricted to coding regions. In total, this resource provides 92.1% in silico coverage of the human genome. The online server allows researchers to download a complete repository of oligonucleotide probes and design customized capture assays to target multiple regions throughout the human genome. The website has query tools for selecting and evaluating capture oligonucleotides from specified genomic regions.

    View details for DOI 10.1093/nar/gkr973

    View details for PubMedID 22102592

    View details for PubMedCentralID PMC3245143

  • Performance comparison of whole-genome sequencing platforms NATURE BIOTECHNOLOGY Lam, H. Y., Clark, M. J., Chen, R., Chen, R., Natsoulis, G., O'Huallachain, M., Dewey, F. E., Habegger, L., Ashley, E. A., Gerstein, M. B., Butte, A. J., Ji, H. P., Snyder, M. 2012; 30 (1): 78-U118

    Abstract

    Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ∼76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ∼3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms.

    View details for DOI 10.1038/nbt.2065

    View details for Web of Science ID 000299110600023

  • A cross-sample statistical model for SNP detection in short-read sequencing data NUCLEIC ACIDS RESEARCH Muralidharan, O., Natsoulis, G., Bell, J., Newburger, D., Xu, H., Kela, I., Ji, H., Zhang, N. 2012; 40 (1)

    Abstract

    Highly multiplex DNA sequencers have greatly expanded our ability to survey human genomes for previously unknown single nucleotide polymorphisms (SNPs). However, sequencing and mapping errors, though rare, contribute substantially to the number of false discoveries in current SNP callers. We demonstrate that we can significantly reduce the number of false positive SNP calls by pooling information across samples. Although many studies prepare and sequence multiple samples with the same protocol, most existing SNP callers ignore cross-sample information. In contrast, we propose an empirical Bayes method that uses cross-sample information to learn the error properties of the data. This error information lets us call SNPs with a lower false discovery rate than existing methods.

    View details for DOI 10.1093/nar/gkr851

    View details for Web of Science ID 000298733500005

    View details for PubMedID 22064853

    View details for PubMedCentralID PMC3245949

  • The Human OligoGenome Resource: a database of oligonucleotide capture probes for resequencing target regions across the human genome NUCLEIC ACIDS RESEARCH Newburger, D. E., Natsoulis, G., Grimes, S., Bell, J. M., Davis, R. W., Batzoglou, S., Ji, H. P. 2012; 40 (D1): D1137-D1143

    Abstract

    Recent exponential growth in the throughput of next-generation DNA sequencing platforms has dramatically spurred the use of accessible and scalable targeted resequencing approaches. This includes candidate region diagnostic resequencing and novel variant validation from whole genome or exome sequencing analysis. We have previously demonstrated that selective genomic circularization is a robust in-solution approach for capturing and resequencing thousands of target human genome loci such as exons and regulatory sequences. To facilitate the design and production of customized capture assays for any given region in the human genome, we developed the Human OligoGenome Resource (http://oligogenome.stanford.edu/). This online database contains over 21 million capture oligonucleotide sequences. It enables one to create customized and highly multiplexed resequencing assays of target regions across the human genome and is not restricted to coding regions. In total, this resource provides 92.1% in silico coverage of the human genome. The online server allows researchers to download a complete repository of oligonucleotide probes and design customized capture assays to target multiple regions throughout the human genome. The website has query tools for selecting and evaluating capture oligonucleotides from specified genomic regions.

    View details for DOI 10.1093/nar/gkr973

    View details for Web of Science ID 000298601300170

    View details for PubMedCentralID PMC3245143

  • Quantitative and Sensitive Detection of Cancer Genome Amplifications from Formalin Fixed Paraffin Embedded Tumors with Droplet Digital PCR. Translational medicine (Sunnyvale, Calif.) Nadauld, L., Regan, J. F., Miotke, L., Pai, R. K., Longacre, T. A., Kwok, S. S., Saxonov, S., Ford, J. M., Ji, H. P. 2012; 2 (2)

    Abstract

    For the analysis of cancer, there is great interest in rapid and accurate detection of cancer genome amplifications containing oncogenes that are potential therapeutic targets. The vast majority of cancer tissue samples are formalin fixed and paraffin embedded (FFPE) which enables histopathological examination and long term archiving. However, FFPE cancer genomic DNA is oftentimes degraded and generally a poor substrate for many molecular biology assays. To overcome the issues of poor DNA quality from FFPE samples and detect oncogenic copy number amplifications with high accuracy and sensitivity, we developed a novel approach. Our assay requires nanogram amounts of genomic DNA, thus facilitating study of small amounts of clinical samples. Using droplet digital PCR (ddPCR), we can determine the relative copy number of specific genomic loci even in the presence of intermingled normal tissue. We used a control dilution series to determine the limits of detection for the ddPCR assay and report its improved sensitivity on minimal amounts of DNA compared to standard real-time PCR. To develop this approach, we designed an assay for the fibroblast growth factor receptor 2 gene (FGFR2) that is amplified in a gastric and breast cancers as well as others. We successfully utilized ddPCR to ascertain FGFR2 amplifications from FFPE-preserved gastrointestinal adenocarcinomas.

    View details for PubMedID 23682346

  • Ultrasensitive detection of rare mutations using next-generation targeted resequencing NUCLEIC ACIDS RESEARCH Flaherty, P., Natsoulis, G., Muralidharan, O., Winters, M., Buenrostro, J., Bell, J., Brown, S., Holodniy, M., Zhang, N., Ji, H. P. 2012; 40 (1)

    Abstract

    With next-generation DNA sequencing technologies, one can interrogate a specific genomic region of interest at very high depth of coverage and identify less prevalent, rare mutations in heterogeneous clinical samples. However, the mutation detection levels are limited by the error rate of the sequencing technology as well as by the availability of variant-calling algorithms with high statistical power and low false positive rates. We demonstrate that we can robustly detect mutations at 0.1% fractional representation. This represents accurate detection of one mutant per every 1000 wild-type alleles. To achieve this sensitive level of mutation detection, we integrate a high accuracy indexing strategy and reference replication for estimating sequencing error variance. We employ a statistical model to estimate the error rate at each position of the reference and to quantify the fraction of variant base in the sample. Our method is highly specific (99%) and sensitive (100%) when applied to a known 0.1% sample fraction admixture of two synthetic DNA samples to validate our method. As a clinical application of this method, we analyzed nine clinical samples of H1N1 influenza A and detected an oseltamivir (antiviral therapy) resistance mutation in the H1N1 neuraminidase gene at a sample fraction of 0.18%.

    View details for DOI 10.1093/nar/gkr861

    View details for Web of Science ID 000298733500002

    View details for PubMedID 22013163

    View details for PubMedCentralID PMC3245950

  • Targeted sequencing library preparation by genomic DNA circularization BMC BIOTECHNOLOGY Myllykangas, S., Natsoulis, G., Bell, J. M., Ji, H. P. 2011; 11

    Abstract

    For next generation DNA sequencing, we have developed a rapid and simple approach for preparing DNA libraries of targeted DNA content. Current protocols for preparing DNA for next-generation targeted sequencing are labor-intensive, require large amounts of starting material, and are prone to artifacts that result from necessary PCR amplification of sequencing libraries. Typically, sample preparation for targeted NGS is a two-step process where (1) the desired regions are selectively captured and (2) the ends of the DNA molecules are modified to render them compatible with any given NGS sequencing platform.In this proof-of-concept study, we present an integrated approach that combines these two separate steps into one. Our method involves circularization of a specific genomic DNA molecule that directly incorporates the necessary components for conducting sequencing in a single assay and requires only one PCR amplification step. We also show that specific regions of the genome can be targeted and sequenced without any PCR amplification.We anticipate that these rapid targeted libraries will be useful for validation of variants and may have diagnostic application.

    View details for DOI 10.1186/1472-6750-11-122

    View details for Web of Science ID 000300427900001

    View details for PubMedID 22168766

    View details for PubMedCentralID PMC3280942

  • Efficient targeted resequencing of human germline and cancer genomes by oligonucleotide-selective sequencing NATURE BIOTECHNOLOGY Myllykangas, S., Buenrostro, J. D., Natsoulis, G., Bell, J. M., Ji, H. P. 2011; 29 (11): 1024-U95

    Abstract

    We describe an approach for targeted genome resequencing, called oligonucleotide-selective sequencing (OS-Seq), in which we modify the immobilized lawn of oligonucleotide primers of a next-generation DNA sequencer to function as both a capture and sequencing substrate. We apply OS-Seq to resequence the exons of either 10 or 344 cancer genes from human DNA samples. In our assessment of capture performance, >87% of the captured sequence originated from the intended target region with sequencing coverage falling within a tenfold range for a majority of all targets. Single nucleotide variants (SNVs) called from OS-Seq data agreed with >95% of variants obtained from whole-genome sequencing of the same individual. We also demonstrate mutation discovery from a colorectal cancer tumor sample matched with normal tissue. Overall, we show the robust performance and utility of OS-Seq for the resequencing analysis of human germline and cancer genomes.

    View details for DOI 10.1038/nbt.1996

    View details for Web of Science ID 000296801300024

    View details for PubMedID 22020387

  • A Flexible Approach for Highly Multiplexed Candidate Gene Targeted Resequencing PLOS ONE Natsoulis, G., Bell, J. M., Xu, H., Buenrostro, J. D., Ordonez, H., Grimes, S., Newburger, D., Jensen, M., Zahn, J. M., Zhang, N., Ji, H. P. 2011; 6 (6)

    Abstract

    We have developed an integrated strategy for targeted resequencing and analysis of gene subsets from the human exome for variants. Our capture technology is geared towards resequencing gene subsets substantially larger than can be done efficiently with simplex or multiplex PCR but smaller in scale than exome sequencing. We describe all the steps from the initial capture assay to single nucleotide variant (SNV) discovery. The capture methodology uses in-solution 80-mer oligonucleotides. To provide optimal flexibility in choosing human gene targets, we designed an in silico set of oligonucleotides, the Human OligoExome, that covers the gene exons annotated by the Consensus Coding Sequencing Project (CCDS). This resource is openly available as an Internet accessible database where one can download capture oligonucleotides sequences for any CCDS gene and design custom capture assays. Using this resource, we demonstrated the flexibility of this assay by custom designing capture assays ranging from 10 to over 100 gene targets with total capture sizes from over 100 Kilobases to nearly one Megabase. We established a method to reduce capture variability and incorporated indexing schemes to increase sample throughput. Our approach has multiple applications that include but are not limited to population targeted resequencing studies of specific gene subsets, validation of variants discovered in whole genome sequencing surveys and possible diagnostic analysis of disease gene subsets. We also present a cost analysis demonstrating its cost-effectiveness for large population studies.

    View details for DOI 10.1371/journal.pone.0021088

    View details for Web of Science ID 000292291800008

    View details for PubMedID 21738606

    View details for PubMedCentralID PMC3127857

  • Genetic-based biomarkers and next-generation sequencing: the future of personalized care in colorectal cancer PERSONALIZED MEDICINE Kim, R. Y., Xu, H., Myllykangas, S., Ji, H. 2011; 8 (3): 331-345

    Abstract

    The past 5 years have witnessed extraordinary advances in the field of DNA sequencing technology. What once took years to accomplish with Sanger sequencing can now be accomplished in a matter of days with next-generation sequencing (NGS) technology. This has allowed researchers to sequence individual genomes and match combinations of mutations with specific diseases. As cancer is inherently a disease of the genome, it is not surprising to see NGS technology already being applied to cancer research with promises of greater understanding of carcinogenesis. While the task of deciphering the cancer genomic code remains ongoing, we are already beginning to see the application of genetic-based testing in the area of colorectal cancer. In this article we will provide an overview of current colorectal cancer genetic-based biomarkers, namely mutations and other genetic alterations in cancer genome DNA, discuss recent advances in NGS technology and speculate on future directions for the application of NGS technology to colorectal cancer diagnosis and treatment.

    View details for DOI 10.2217/PME.11.16

    View details for Web of Science ID 000291444800013

    View details for PubMedCentralID PMC3646399

  • Genetic-based biomarkers and next-generation sequencing: the future of personalized care in colorectal cancer. Personalized medicine Kim, R. Y., Xu, H., Myllykangas, S., Ji, H. 2011; 8 (3): 331-345

    Abstract

    The past 5 years have witnessed extraordinary advances in the field of DNA sequencing technology. What once took years to accomplish with Sanger sequencing can now be accomplished in a matter of days with next-generation sequencing (NGS) technology. This has allowed researchers to sequence individual genomes and match combinations of mutations with specific diseases. As cancer is inherently a disease of the genome, it is not surprising to see NGS technology already being applied to cancer research with promises of greater understanding of carcinogenesis. While the task of deciphering the cancer genomic code remains ongoing, we are already beginning to see the application of genetic-based testing in the area of colorectal cancer. In this article we will provide an overview of current colorectal cancer genetic-based biomarkers, namely mutations and other genetic alterations in cancer genome DNA, discuss recent advances in NGS technology and speculate on future directions for the application of NGS technology to colorectal cancer diagnosis and treatment.

    View details for DOI 10.2217/pme.11.16

    View details for PubMedID 23662107

    View details for PubMedCentralID PMC3646399

  • Identification of Novel LNK Mutations In Patients with Chronic Myeloproliferative Neoplasms and Related Disorders 52nd Annual Meeting and Exposition of the American-Society-of-Hematology (ASH) Oh, S. T., Zahn, J. M., Jones, C. D., Zhang, B., Loh, M. L., Kantarjian, H., Simonds, E. F., Bruggner, R. V., Abidi, P., Natsoulis, G., Bell, J., Buenrostro, J., Nolan, G. P., Zehnder, J. L., Ji, H. P., Gotlib, J. AMER SOC HEMATOLOGY. 2010: 143–44
  • Detecting simultaneous changepoints in multiple sequences BIOMETRIKA Zhang, N. R., Siegmund, D. O., Ji, H., Li, J. Z. 2010; 97 (3): 631-645

    Abstract

    We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.

    View details for DOI 10.1093/biomet/asq025

    View details for Web of Science ID 000280904000008

    View details for PubMedCentralID PMC3372242

  • Detecting simultaneous changepoints in multiple sequences. Biometrika Zhang, N. R., Siegmund, D. O., Ji, H., Li, J. Z. 2010; 97 (3): 631-645

    Abstract

    We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.

    View details for DOI 10.1093/biomet/asq025

    View details for PubMedID 22822250

    View details for PubMedCentralID PMC3372242

  • Oncogenic BRAF Mutation with CDKN2A Inactivation Is Characteristic of a Subset of Pediatric Malignant Astrocytomas CANCER RESEARCH Schiffman, J. D., Hodgson, J. G., VandenBerg, S. R., Flaherty, P., Polley, M. C., Yu, M., Fisher, P. G., Rowitch, D. H., Ford, J. M., Berger, M. S., Ji, H., Gutmann, D. H., James, C. D. 2010; 70 (2): 512-519

    Abstract

    Malignant astrocytomas are a deadly solid tumor in children. Limited understanding of their underlying genetic basis has contributed to modest progress in developing more effective therapies. In an effort to identify such alterations, we performed a genome-wide search for DNA copy number aberrations (CNA) in a panel of 33 tumors encompassing grade 1 through grade 4 tumors. Genomic amplifications of 10-fold or greater were restricted to grade 3 and 4 astrocytomas and included the MDM4 (1q32), PDGFRA (4q12), MET (7q21), CMYC (8q24), PVT1 (8q24), WNT5B (12p13), and IGF1R (15q26) genes. Homozygous deletions of CDKN2A (9p21), PTEN (10q26), and TP53 (17p3.1) were evident among grade 2 to 4 tumors. BRAF gene rearrangements that were indicated in three tumors prompted the discovery of KIAA1549-BRAF fusion transcripts expressed in 10 of 10 grade 1 astrocytomas and in none of the grade 2 to 4 tumors. In contrast, an oncogenic missense BRAF mutation (BRAF(V600E)) was detected in 7 of 31 grade 2 to 4 tumors but in none of the grade 1 tumors. BRAF(V600E) mutation seems to define a subset of malignant astrocytomas in children, in which there is frequent concomitant homozygous deletion of CDKN2A (five of seven cases). Taken together, these findings highlight BRAF as a frequent mutation target in pediatric astrocytomas, with distinct types of BRAF alteration occurring in grade 1 versus grade 2 to 4 tumors.

    View details for DOI 10.1158/0008-5472.CAN-09-1851

    View details for Web of Science ID 000278485500011

    View details for PubMedID 20068183

    View details for PubMedCentralID PMC2851233

  • Targeted deep resequencing of the human cancer genome using next-generation technologies BIOTECHNOLOGY AND GENETIC ENGINEERING REVIEWS, VOL 27 Myllykangas, S., Ji, H. P. 2010; 27: 135-158

    Abstract

    Next-generation sequencing technologies have revolutionized our ability to identify genetic variants, either germline or somatic point mutations, that occur in cancer. Parallelization and miniaturization of DNA sequencing enables massive data throughput and for the first time, large-scale, nucleotide resolution views of cancer genomes can be achieved. Systematic, large-scale sequencing surveys have revealed that the genetic spectrum of mutations in cancers appears to be highly complex with numerous low frequency bystander somatic variations, and a limited number of common, frequently mutated genes. Large sample sizes and deeper resequencing are much needed in resolving clinical and biological relevance of the mutations as well as in detecting somatic variants in heterogeneous samples and cancer cell sub-populations. However, even with the next-generation sequencing technologies, the overwhelming size of the human genome and need for very high fold coverage represents a major challenge for up-scaling cancer genome sequencing projects. Assays to target, capture, enrich or partition disease-specific regions of the genome offer immediate solutions for reducing the complexity of the sequencing libraries. Integration of targeted DNA capture assays and next-generation deep resequencing improves the ability to identify clinically and biologically relevant mutations.

    View details for Web of Science ID 000286179900006

    View details for PubMedID 21415896

  • Identification of a biomarker panel using a multiplex proximity ligation assay improves accuracy of pancreatic cancer diagnosis JOURNAL OF TRANSLATIONAL MEDICINE Chang, S. T., Zahn, J. M., Horecka, J., Kunz, P. L., Ford, J. M., Fisher, G. A., Le, Q. T., Chang, D. T., Ji, H., Koong, A. C. 2009; 7

    Abstract

    Pancreatic cancer continues to prove difficult to clinically diagnose. Multiple simultaneous measurements of plasma biomarkers can increase sensitivity and selectivity of diagnosis. Proximity ligation assay (PLA) is a highly sensitive technique for multiplex detection of biomarkers in plasma with little or no interfering background signal.We examined the plasma levels of 21 biomarkers in a clinically defined cohort of 52 locally advanced (Stage II/III) pancreatic ductal adenocarcinoma cases and 43 age-matched controls using a multiplex proximity ligation assay. The optimal biomarker panel for diagnosis was computed using a combination of the PAM algorithm and logistic regression modeling. Biomarkers that were significantly prognostic for survival in combination were determined using univariate and multivariate Cox survival models.Three markers, CA19-9, OPN and CHI3L1, measured in multiplex were found to have superior sensitivity for pancreatic cancer vs. CA19-9 alone (93% vs. 80%). In addition, we identified two markers, CEA and CA125, that when measured simultaneously have prognostic significance for survival for this clinical stage of pancreatic cancer (p < 0.003).A multiplex panel assaying CA19-9, OPN and CHI3L1 in plasma improves accuracy of pancreatic cancer diagnosis. A panel assaying CEA and CA125 in plasma can predict survival for this clinical cohort of pancreatic cancer patients.

    View details for DOI 10.1186/1479-5876-7-105

    View details for PubMedID 20003342

  • ASSOCIATION OF 7Q34 COPY NUMBER GAINS AND KIAA1549-BRAF GENE FUSIONS WITH JUVENILE PILOCYTIC ASTROCYTOMA Hodgson, J., VandenBerg, S. R., James, C., Perry, A., Gutmann, D., Fisher, P., Ford, J., Ji, H., Schiffman, J. OXFORD UNIV PRESS INC. 2009: 960
  • Molecular inversion probes reveal patterns of 9p21 deletion and copy number aberrations in childhood leukemia CANCER GENETICS AND CYTOGENETICS Schiffman, J. D., Wang, Y., McPherson, L. A., Welch, K., Zhang, N., Davis, R., Lacayo, N. J., Dahl, G. V., Faham, M., Ford, J. M., Ji, H. P. 2009; 193 (1): 9-18

    Abstract

    Childhood leukemia, which accounts for >30% of newly diagnosed childhood malignancies, is one of the leading causes of death for children with cancer. Genome-wide studies using microarray chips to identify copy number changes in human cancer are becoming more common. In this pilot study, 45 pediatric leukemia samples were analyzed for gene copy aberrations using novel molecular inversion probe (MIP) technology. Acute leukemia subtypes included precursor B-cell acute lymphoblastic leukemia (ALL) (n=23), precursor T-cell ALL (n=6), and acute myeloid leukemia (n=14). The MIP analysis identified 69 regions of recurring copy number changes, of which 41 have not been identified with other DNA microarray platforms. Copy number gains and losses were validated in 98% of clinical karyotypes and 100% of fluorescence in situ hybridization studies available. We report unique patterns of copy number loss in samples with 9p21.3 (CDKN2A) deletion in the precursor B-cell ALL patients, compared with the precursor T-cell ALL patients. MIPs represent an attractive technology for identifying novel copy number aberrations, validating previously reported copy number changes, and translating molecular findings into clinically relevant targets for further investigation.

    View details for DOI 10.1016/j.cancergencyto.2009.03.005

    View details for Web of Science ID 000268922900002

    View details for PubMedID 19602459

    View details for PubMedCentralID PMC2776674

  • Paired phospho-proteomic and genomic analyses reveal functionally distinct subclones in refractory pediatric acute myeloid leukemia Simonds, E., Schiffman, J., Gramatges, M., Dahl, G., Ford, J., Lacayo, N., Ji, H., Nolan, G. AMER ASSOC CANCER RESEARCH. 2009
  • Disperse-a software system for design of selector probes for exon resequencing applications BIOINFORMATICS Stenberg, J., Zhang, M., Ji, H. 2009; 25 (5): 666-667

    Abstract

    Selector probes enable the amplification of many selected regions of the genome in multiplex. Disperse is a software pipeline that automates the procedure of designing selector probes for exon resequencing applications.Software and documentation is available at http://bioinformatics.org/disperse

    View details for DOI 10.1093/bioinformatics/btp001

    View details for Web of Science ID 000263834600018

    View details for PubMedID 19158162

    View details for PubMedCentralID PMC2647824

  • Molecular inversion probe assay for allelic quantitation. Methods in molecular biology (Clifton, N.J.) Ji, H., Welch, K. 2009; 556: 67-87

    Abstract

    Molecular inversion probe (MIP) technology has been demonstrated to be a robust platform for large-scale dual genotyping and copy number analysis. Applications in human genomic and genetic studies include the possibility of running dual germline genotyping and combined copy number variation ascertainment. MIPs analyze large numbers of specific genetic target sequences in parallel, relying on interrogation of a barcode tag, rather than direct hybridization of genomic DNA to an array. The MIP approach does not replace, but is complementary to many of the copy number technologies being performed today. Some specific advantages of MIP technology include: less DNA required (37 ng vs. 250 ng), DNA quality less important, more dynamic range (amplifications detected up to copy number 60), allele-specific information "cleaner" (less SNP cross-talk/contamination), and quality of markers better (fewer individual MIPs versus SNPs needed to identify copy number changes). MIPs can be considered a candidate gene (targeted whole genome) approach and can find specific areas of interest that otherwise may be missed with other methods.

    View details for DOI 10.1007/978-1-60327-192-9_6

    View details for PubMedID 19488872

    View details for PubMedCentralID PMC2988579

  • FOXM1 OVEREXPRESSION AND DNA AMPLIFICATION IN PEDIATRIC ASTROCYTOMAS Hodgson, G., Vandenberg, S., Fisher, P., Yu, M., James, C., Rowitch, D., Ford, J., Ji, H., Schiffman, J. OXFORD UNIV PRESS INC. 2008: 805–6
  • Next-generation DNA sequencing NATURE BIOTECHNOLOGY Shendure, J., Ji, H. 2008; 26 (10): 1135-1145

    Abstract

    DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-throughput data collection. Over the past three years, massively parallel DNA sequencing platforms have become widely available, reducing the cost of DNA sequencing by over two orders of magnitude, and democratizing the field by putting the sequencing capacity of a major genome center in the hands of individual investigators. These new technologies are rapidly evolving, and near-term challenges include the development of robust protocols for generating sequencing libraries, building effective new approaches to data-analysis, and often a rethinking of experimental design. Next-generation DNA sequencing has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant production-scale efforts.

    View details for DOI 10.1038/nbt1486

    View details for Web of Science ID 000259926000028

    View details for PubMedID 18846087

  • Analysis of Genomic Instability in Colorectal Carcinoma Flaherty, P., Davis, R. W., Ji, H. FEDERATION AMER SOC EXP BIOL. 2008
  • Gene-specific delineation of copy number aberrations in follicular lymphoma with molecular inversion probes 49th Annual Meeting of the American-Society-of-Hematology Ji, H. P., Welch, K. M., Wang, Y., Faham, M., Akasaka, T., Czerwinski, D., Davis, R. W., Levy, R. AMER SOC HEMATOLOGY. 2007: 766A–767A
  • Molecular Inversion Probes (MIPs) identify novel areas of allelic imbalance in childhood leukemia Schiffman, J. D., Welch, K., Davis, R., Lacayo, N. J., Dahl, G. V., Wang, Y., Faham, M., Ford, J. M., Ji, H. P. AMER SOC HEMATOLOGY. 2007: 431A
  • Adapting molecular inversion probe (MIP) technology for allele quantification in childhood leukemia Schiffman, J. D., Welch, K. M., Davis, R., Dahl, G. V., Lacayo, N. J., Faham, M., Ford, J. M., Ji, H. AMER SOC CLINICAL ONCOLOGY. 2007
  • Multigene amplification and massively parallel sequencing for cancer mutation discovery PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Dahl, F., Stenberg, J., Fredriksson, S., Welch, K., Zhang, M., Nilsson, M., Bicknell, D., Bodmer, W. F., Davis, R. W., Ji, H. 2007; 104 (22): 9387-9392

    Abstract

    We have developed a procedure for massively parallel resequencing of multiple human genes by combining a highly multiplexed and target-specific amplification process with a high-throughput parallel sequencing technology. The amplification process is based on oligonucleotide constructs, called selectors, that guide the circularization of specific DNA target regions. Subsequently, the circularized target sequences are amplified in multiplex and analyzed by using a highly parallel sequencing-by-synthesis technology. As a proof-of-concept study, we demonstrate parallel resequencing of 10 cancer genes covering 177 exons with average sequence coverage per sample of 93%. Seven cancer cell lines and one normal genomic DNA sample were studied with multiple mutations and polymorphisms identified among the 10 genes. Mutations and polymorphisms in the TP53 gene were confirmed by traditional sequencing.

    View details for DOI 10.1073/pnas.0702165104

    View details for Web of Science ID 000246935700055

    View details for PubMedID 17517648

    View details for PubMedCentralID PMC1871563

  • Multiplex amplification of all coding sequences within 10 cancer genes by Gene-Collector NUCLEIC ACIDS RESEARCH Fredriksson, S., Baner, J., Dahl, F., Chu, A., Ji, H., Welch, K., Davis, R. W. 2007; 35 (7)

    Abstract

    Herein we present Gene-Collector, a method for multiplex amplification of nucleic acids. The procedure has been employed to successfully amplify the coding sequence of 10 human cancer genes in one assay with uniform abundance of the final products. Amplification is initiated by a multiplex PCR in this case with 170 primer pairs. Each PCR product is then specifically circularized by ligation on a Collector probe capable of juxtapositioning only the perfectly matched cognate primer pairs. Any amplification artifacts typically associated with multiplex PCR derived from the use of many primer pairs such as false amplicons, primer-dimers etc. are not circularized and degraded by exonuclease treatment. Circular DNA molecules are then further enriched by randomly primed rolling circle replication. Amplification was successful for 90% of the targeted amplicons as seen by hybridization to a custom resequencing DNA micro-array. Real-time quantitative PCR revealed that 96% of the amplification products were all within 4-fold of the average abundance. Gene-Collector has utility for numerous applications such as high throughput resequencing, SNP analyses, and pathogen detection.

    View details for DOI 10.1093/nar/gkm078

    View details for Web of Science ID 000246294700001

    View details for PubMedID 17317684

    View details for PubMedCentralID PMC1874629

  • Multiplexed protein detection by proximity ligation for cancer biomarker validation NATURE METHODS Fredriksson, S., Dixon, W., Ji, H., Koong, A. C., Mindrinos, M., Davis, R. W. 2007; 4 (4): 327-329

    Abstract

    We present a proximity ligation-based multiplexed protein detection procedure in which several selected proteins can be detected via unique nucleic-acid identifiers and subsequently quantified by real-time PCR. The assay requires a 1-microl sample, has low-femtomolar sensitivity as well as five-log linear range and allows for modular multiplexing without cross-reactivity. The procedure can use a single polyclonal antibody batch for each target protein, simplifying affinity-reagent creation for new biomarker candidates.

    View details for DOI 10.1038/NMETH1020

    View details for Web of Science ID 000245584900013

    View details for PubMedID 17369836

  • Under-expression of Kalirin-7 increases iNOS activity in cultured cells and correlates to elevated iNOS activity in Alzheimer's disease hippocampus JOURNAL OF ALZHEIMERS DISEASE Youn, H., Ji, I., Ji, H. P., Markesbery, W. R., Ji, T. H. 2007; 12 (3): 271-281

    Abstract

    Recently, it has been reported that Kalirin gene transcripts are under-expressed in AD hippocampal specimens compared to the controls. The Kalirin gene generates a dozen Kalirin isoforms. Kalirin-7 is the predominant protein expressed in the adult brain and plays crucial roles in growth and maintenance of neurons. Yet its role in human diseases is unknown. We report that Kalirin-7 is significantly diminished both at the mRNA and protein levels in the hippocampus specimens from 19 AD patients compared to the specimens from 15 controls. Kalirin-7 associates with iNOS in the hippocampus, and therefore, Kalirin-7 is complexed with iNOS less in AD hippocampus extracts than in control hippocampus extracts. In cultured cells, Kalirin-7 associates with iNOS and down-regulates the enzyme activity. The down-regulation is attributed to the highly conserved 33 amino acid sequence, K(617) -H(649), of the 1,663 amino acids long Kalirin-7. Remarkably, the iNOS activity is considerably higher in the hippocampus specimens from AD patients than the specimens from 15 controls. These observations suggest that the under-expression of Kalirin-7 in AD hippocampus correlates to the elevated iNOS activity.

    View details for Web of Science ID 000252300000009

    View details for PubMedID 18057561

  • Reproducibility Probability Score - incorporating measurement variability across laboratories for gene selection NATURE BIOTECHNOLOGY Lin, G., He, X., Ji, H., Shi, L., Davis, R. W., Zhong, S. 2006; 24 (12): 1476-1477

    View details for Web of Science ID 000242795800015

    View details for PubMedID 17160039

  • Data quality in genomics and microarrays NATURE BIOTECHNOLOGY Ji, H., Davis, R. W. 2006; 24 (9): 1112-1113

    View details for DOI 10.1038/nbt0906-1108

    View details for Web of Science ID 000240495200031

    View details for PubMedID 16964224

    View details for PubMedCentralID PMC2943412

  • The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements NATURE BIOTECHNOLOGY Shi, L., Reid, L. H., Jones, W. D., Shippy, R., Warrington, J. A., Baker, S. C., Collins, P. J., de Longueville, F., Kawasaki, E. S., Lee, K. Y., Luo, Y., Sun, Y. A., Willey, J. C., Setterquist, R. A., Fischer, G. M., Tong, W., Dragan, Y. P., Dix, D. J., Frueh, F. W., Goodsaid, F. M., Herman, D., Jensen, R. V., Johnson, C. D., Lobenhofer, E. K., Puri, R. K., Scherf, U., Thierry-Mieg, J., Wang, C., Wilson, M., Wolber, P. K., Zhang, L., Amur, S., Bao, W., Barbacioru, C. C., Lucas, A. B., Bertholet, V., Boysen, C., Bromley, B., Brown, D., Brunner, A., Canales, R., Cao, X. M., Cebula, T. A., Chen, J. J., Cheng, J., Chu, T., Chudin, E., Corson, J., Corton, J. C., Croner, L. J., Davies, C., Davison, T. S., Delenstarr, G., Deng, X., Dorris, D., Eklund, A. C., Fan, X., Fang, H., Fulmer-Smentek, S., Fuscoe, J. C., Gallagher, K., Ge, W., Guo, L., Guo, X., Hager, J., Haje, P. K., Han, J., Han, T., Harbottle, H. C., Harris, S. C., Hatchwell, E., Hauser, C. A., Hester, S., Hong, H., Hurban, P., Jackson, S. A., Ji, H., Knight, C. R., Kuo, W. P., LeClerc, J. E., Levy, S., Li, Q., Liu, C., Liu, Y., Lombardi, M. J., Ma, Y., Magnuson, S. R., Maqsodi, B., McDaniel, T., Mei, N., Myklebost, O., Ning, B., Novoradovskaya, N., Orr, M. S., Osborn, T. W., Papallo, A., Patterson, T. A., Perkins, R. G., Peters, E. H., Peterson, R., Philips, K. L., Pine, P. S., Pusztai, L., Qian, F., Ren, H., Rosen, M., Rosenzweig, B. A., Samaha, R. R., Schena, M., Schroth, G. P., Shchegrova, S., Smith, D. D., Staedtler, F., Su, Z., Sun, H., Szallasi, Z., Tezak, Z., Thierry-Mieg, D., Thompson, K. L., Tikhonova, I., Turpaz, Y., Vallanat, B., Van, C., Walker, S. J., Wang, S. J., Wang, Y., Wolfinger, R., Wong, A., Wu, J., Xiao, C., Xie, Q., Xu, J., Yang, W., Zhang, L., Zhong, S., Zong, Y., Slikker, W. 2006; 24 (9): 1151-1161

    Abstract

    Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource that represents an important first step toward establishing a framework for the use of microarrays in clinical and regulatory settings.

    View details for DOI 10.1038/nbt1239

    View details for Web of Science ID 000240495200036

    View details for PubMedID 16964229

    View details for PubMedCentralID PMC3272078

  • Molecular inversion probe analysis of gene copy alterations reveals distinct categories of colorectal carcinoma CANCER RESEARCH Ji, H., Kumm, J., Zhang, M., Farnam, K., Salari, K., Faham, M., Ford, J. M., Davis, R. W. 2006; 66 (16): 7910-7919

    Abstract

    Genomic instability is a major feature of neoplastic development in colorectal carcinoma and other cancers. Specific genomic instability events, such as deletions in chromosomes and other alterations in gene copy number, have potential utility as biologically relevant prognostic biomarkers. For example, genomic deletions on chromosome arm 18q are an indicator of colorectal carcinoma behavior and potentially useful as a prognostic indicator. Adapting a novel genomic technology called molecular inversion probes which can determine gene copy alterations, such as genomic deletions, we designed a set of probes to interrogate several hundred individual exons of >200 cancer genes with an overall distribution covering all chromosome arms. In addition, >100 probes were designed in close proximity of microsatellite markers on chromosome arm 18q. We analyzed a set of colorectal carcinoma cell lines and primary colorectal tumor samples for gene copy alterations and deletion mutations in exons. Based on clustering analysis, we distinguished the different categories of genomic instability among the colorectal cancer cell lines. Our analysis of primary tumors uncovered several distinct categories of colorectal carcinoma, each with specific patterns of 18q deletions and deletion mutations in specific genes. This finding has potential clinical ramifications given the application of 18q loss of heterozygosity events as a potential indicator for adjuvant treatment in stage II colorectal carcinoma.

    View details for DOI 10.1158/0008-5472.CAN-06-0595

    View details for PubMedID 16912164

  • Analysis of genomic DNA copy number alterations in chromosome arm 18q demonstrates distinct molecular categories of colorectal carcinoma. Ji, H., Zhang, M., Farnam, K., Salari, K., Davis, R., Ford, J. M. AMER SOC CLINICAL ONCOLOGY. 2006: 542S
  • A functional assay for mutations in tumor suppressor genes caused by mismatch repair deficiency HUMAN MOLECULAR GENETICS Ji, H. P., King, M. C. 2001; 10 (24): 2737-2743

    Abstract

    The coding sequences of multiple human tumor suppressor genes include microsatellite sequences that are prone to mutations. Saccharomyces cerevisiae strains deficient in DNA mismatch repair (MMR) can be used to determine de novo mutation rates of these human tumor suppressor genes as well as any other gene sequence. Microsatellites in human TGFBR2, PTEN and APC genes were placed in yeast vectors and analyzed in isogenic yeast strains that were wild-type or deletion mutants for MSH2 or MLH1. In MMR-deficient strains, the vector containing the (A)(10) microsatellite sequence of TGFBR2 had a mutation rate (mutations/cell division) of 1.4 x 10(-4), compared to a mutation rate of 1.7 x 10(-6) in the wild-type strain. In MMR-deficient strains, mutation rates in PTEN and APC were also elevated above background levels. PTEN mutation rates were higher in both msh2 (4.4 x 10-5) and mlh1 strains (2.3 x 10-5). APC mutation rates in the msh2 strain (2.4 x 10-6) and the mlh1 strain (1.7 x 10-6) were also significantly, but less dramatically, elevated over background. Mutations selected for in the yeast screen were identical to those previously observed in human tumor samples with microsatellite instability (MSI). This functional assay has applicability in providing quantitative data about microsatellite mutation rates caused by MMR deficiency in any human tumor suppressor gene sequence. It can also be applied as a genetic screen to identify new genes that are vulnerable to such microsatellite mutations and thus may be involved in the neoplastic development of tumors with MSI.

    View details for Web of Science ID 000172867500001

    View details for PubMedID 11734538

  • Spondyloepimetaphyseal dysplasia with joint laxity (SEMDJL): Presentation in two unrelated patients in the United States AMERICAN JOURNAL OF MEDICAL GENETICS Smith, W., Ji, H. L., Mouradian, W., Pagon, R. A. 1999; 86 (3): 245-252

    Abstract

    This is a report of two North American patients with spondyloepimetaphyseal dysplasia with joint laxity, an uncommon autosomal recessive skeletal dysplasia rarely reported outside of South Africa. Patients with SEMDJL have vertebral abnormalities and ligamentous laxity that results in spinal misalignment and progressive severe kyphoscoliosis, thoracic asymmetry, and respiratory compromise resulting in early death. Nonaxial skeletal involvement includes elbow deformities with radial head dislocation, dislocated hips, clubbed feet, and tapered fingers with spatulate distal phalanges. Many affected children have an oval face, flat midface, prominent eyes with blue sclerae, and a long philtrum. Palatal abnormalities and congenital heart disease are also observed. Diagnosis in infancy may be difficult because many of the typical findings are not apparent early and only evolve over time. We review the physical and radiographic findings in two unrelated patients with this disorder in order to increase the awareness of this disorder, particularly for clinicians outside of South Africa.

    View details for Web of Science ID 000082714300010

    View details for PubMedID 10482874

  • Molecular classification of the inherited hamartoma polyposis syndromes: Clearing the muddied waters AMERICAN JOURNAL OF HUMAN GENETICS Eng, C., Ji, H. L. 1998; 62 (5): 1020-1022

    View details for Web of Science ID 000073487000004

    View details for PubMedID 9545417

  • Inherited mutations in PTEN that are associated with breast cancer, Cowden disease, and juvenile polyposis AMERICAN JOURNAL OF HUMAN GENETICS Lynch, E. D., OSTERMEYER, E. A., Lee, M. K., Arena, J. F., Ji, H. L., Dann, J., Swisshelm, K., Suchard, D., MACLEOD, P. M., KVINNSLAND, S., Gjertsen, B. T., Heimdal, K., Lubs, H., Moller, P., KING, M. C. 1997; 61 (6): 1254-1260

    Abstract

    PTEN, a protein tyrosine phosphatase with homology to tensin, is a tumor-suppressor gene on chromosome 10q23. Somatic mutations in PTEN occur in multiple tumors, most markedly glioblastomas. Germ-line mutations in PTEN are responsible for Cowden disease (CD), a rare autosomal dominant multiple-hamartoma syndrome. PTEN was sequenced from constitutional DNA from 25 families. Germ-line PTEN mutations were detected in all of five families with both breast cancer and CD, in one family with juvenile polyposis syndrome, and in one of four families with breast and thyroid tumors. In this last case, signs of CD were subtle and were diagnosed only in the context of mutation analysis. PTEN mutations were not detected in 13 families at high risk of breast and/or ovarian cancer. No PTEN-coding-sequence polymorphisms were detected in 70 independent chromosomes. Seven PTEN germ-line mutations occurred, five nonsense and two missense mutations, in six of nine PTEN exons. The wild-type PTEN allele was lost from renal, uterine, breast, and thyroid tumors from a single patient. Loss of PTEN expression was an early event, reflected in loss of the wild-type allele in DNA from normal tissue adjacent to the breast and thyroid tumors. In RNA from normal tissues from three families, mutant transcripts appeared unstable. Germ-line PTEN mutations predispose to breast cancer in association with CD, although the signs of CD may be subtle.

    View details for Web of Science ID 000071555900007

    View details for PubMedID 9399897

    View details for PubMedCentralID PMC1716102

  • HOTSPOTS FOR UNSELECTED TY1 TRANSPOSITION EVENTS ON YEAST CHROMOSOME-III ARE NEAR TRANSFER-RNA GENES AND LTR SEQUENCES CELL Ji, H., Moore, D. P., BLOMBERG, M. A., Braiterman, L. T., Voytas, D. F., Natsoulis, G., Boeke, J. D. 1993; 73 (5): 1007-1018

    Abstract

    A collection of yeast strains bearing single marked Ty1 insertions on chromosome III was generated. Over 100 such insertions were physically mapped by pulsed-field gel electrophoresis. These insertions are very nonrandomly distributed. Thirty-two such insertions were cloned by the inverted PCR technique, and the flanking DNA sequences were determined. The sequenced insertions all fell within a few very limited regions of chromosome III. Most of these regions contained tRNA coding regions and/or LTRs of preexisting transposable elements. Open reading frames were disrupted at a far lower frequency than expected for random transposition. The results suggest that the Ty1 integration machinery can detect regions of the genome that may represent "safe havens" for insertion. These regions of the genome do not contain any special DNA sequences, nor do they behave as particularly good targets for Ty1 integration in vitro, suggesting that the targeted regions have special properties allowing specific recognition in vivo.

    View details for Web of Science ID A1993LF06100016

    View details for PubMedID 8388781