Bio


Research interests: pharmacogenomics; systems pharmacology; clinical biomarker discovery; rare variant discovery and interpretation for cancer and other diseases

Education & Certifications


  • MA, University of Edinburgh, Business and Economics (2007)

Stanford Advisors


All Publications


  • Discovery and quality analysis of a comprehensive set of structural variants and short tandem repeats. Nature communications Jakubosky, D., Smith, E. N., D'Antonio, M., Jan Bonder, M., Young Greenwald, W. W., D'Antonio-Chronowska, A., Matsui, H., i2QTL Consortium, Stegle, O., Montgomery, S. B., DeBoever, C., Frazer, K. A., Bonder, M. J., Cai, N., Carcamo-Orive, I., D'Antonio, M., Frazer, K. A., Young Greenwald, W. W., Jakubosky, D., Knowles, J. W., Matsui, H., McCarthy, D. J., Mirauta, B. A., Montgomery, S. B., Quertermous, T., Seaton, D. D., Smail, C., Smith, E. N., Stegle, O. 2020; 11 (1): 2928

    Abstract

    Structural variants (SVs) and short tandem repeats (STRs) are important sources of genetic diversity but are not routinely analyzed in genetic studies because they are difficult to accurately identify and genotype. Because SVs and STRs range in size and type, it is necessary to apply multiple algorithms that incorporate different types of evidence from sequencing data and employ complex filtering strategies to discover a comprehensive set of high-quality and reproducible variants. Here we assemble a set of 719 deep whole genome sequencing (WGS) samples (mean 42*) from 477 distinct individuals which we use to discover and genotype a wide spectrum of SV and STR variants using five algorithms. We use 177 unique pairs of genetic replicates to identify factors that affect variant call reproducibility and develop a systematic filtering strategy to create of one of the most complete and well characterized maps of SVs and STRs to date.

    View details for DOI 10.1038/s41467-020-16481-5

    View details for PubMedID 32522985

  • Properties of structural variants and short tandem repeats associated with gene expression and complex traits. Nature communications Jakubosky, D., D'Antonio, M., Bonder, M. J., Smail, C., Donovan, M. K., Young Greenwald, W. W., Matsui, H., i2QTL Consortium, D'Antonio-Chronowska, A., Stegle, O., Smith, E. N., Montgomery, S. B., DeBoever, C., Frazer, K. A., Bonder, M. J., Cai, N., Carcamo-Orive, I., D'Antonio, M., Frazer, K. A., Young Greenwald, W. W., Jakubosky, D., Knowles, J. W., Matsui, H., McCarthy, D. J., Mirauta, B. A., Montgomery, S. B., Quertermous, T., Seaton, D. D., Smail, C., Smith, E. N., Stegle, O. 2020; 11 (1): 2927

    Abstract

    Structural variants (SVs) and short tandem repeats (STRs) comprise a broad group of diverse DNA variants which vastly differ in their sizes and distributions across the genome. Here, we identify genomic features of SV classes and STRs that are associated with gene expression and complex traits, including their locations relative to eGenes, likelihood of being associated with multiple eGenes, associated eGene types (e.g., coding, noncoding, level of evolutionary constraint), effect sizes, linkage disequilibrium with tagging single nucleotide variants used in GWAS, and likelihood of being associated with GWAS traits. We identify a set of high-impact SVs/STRs associated with the expression of three or more eGenes via chromatin loops and show that they are highly enriched for being associated with GWAS traits. Our study provides insights into the genomic properties of structural variant classes and short tandem repeats that are associated with gene expression and human traits.

    View details for DOI 10.1038/s41467-020-16482-4

    View details for PubMedID 32522982

  • Proficiency Testing of Standardized Samples Shows Very High Interlaboratory Agreement for Clinical Next-Generation Sequencing-Based Oncology Assays ARCHIVES OF PATHOLOGY & LABORATORY MEDICINE Merker, J. D., Devereaux, K., Iafrate, A., Kamel-Reid, S., Kim, A. S., Moncur, J. T., Montgomery, S. B., Nagarajan, R., Portier, B. P., Routbort, M. J., Smail, C., Surrey, L. F., Vasalos, P., Lazar, A. J., Lindeman, N. 2019; 143 (4): 463–71
  • Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nature medicine Frésard, L., Smail, C., Ferraro, N. M., Teran, N. A., Li, X., Smith, K. S., Bonner, D., Kernohan, K. D., Marwaha, S., Zappala, Z., Balliu, B., Davis, J. R., Liu, B., Prybol, C. J., Kohler, J. N., Zastrow, D. B., Reuter, C. M., Fisk, D. G., Grove, M. E., Davidson, J. M., Hartley, T., Joshi, R., Strober, B. J., Utiramerur, S., Lind, L., Ingelsson, E., Battle, A., Bejerano, G., Bernstein, J. A., Ashley, E. A., Boycott, K. M., Merker, J. D., Wheeler, M. T., Montgomery, S. B. 2019

    Abstract

    It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.

    View details for DOI 10.1038/s41591-019-0457-8

    View details for PubMedID 31160820

  • SNPs2ChIP: Latent Factors of ChIP-seq to infer functions of non-coding SNPs. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Anand, S., Kalesinskas, L., Smail, C., Tanigawa, Y. 2019; 24: 184–95

    Abstract

    Genetic variations of the human genome are linked to many disease phenotypes. While whole-genome sequencing and genome-wide association studies (GWAS) have uncovered a number of genotype-phenotype associations, their functional interpretation remains challenging given most single nucleotide polymorphisms (SNPs) fall into the non-coding region of the genome. Advances in chromatin immunoprecipitation sequencing (ChIP-seq) have made large-scale repositories of epigenetic data available, allowing investigation of coordinated mechanisms of epigenetic markers and transcriptional regulation and their influence on biological function. To address this, we propose SNPs2ChIP, a method to infer biological functions of non-coding variants through unsupervised statistical learning methods applied to publicly-available epigenetic datasets. We systematically characterized latent factors by applying singular value decomposition to ChIP-seq tracks of lymphoblastoid cell lines, and annotated the biological function of each latent factor using the genomic region enrichment analysis tool. Using these annotated latent factors as reference, we developed SNPs2ChIP, a pipeline that takes genomic region(s) as an input, identifies the relevant latent factors with quantitative scores, and returns them along with their inferred functions. As a case study, we focused on systemic lupus erythematosus and demonstrated our method's ability to infer relevant biological function. We systematically applied SNPs2ChIP on publicly available datasets, including known GWAS associations from the GWAS catalogue and ChIP-seq peaks from a previously published study. Our approach to leverage latent patterns across genome-wide epigenetic datasets to infer the biological function will advance understanding of the genetics of human diseases by accelerating the interpretation of non-coding genomes.

    View details for PubMedID 30864321

  • SNPs2ChIP: Latent Factors of ChIP-seq to infer functions of non-coding SNPs Anand, S., Kalesinskas, L., Smail, C., Tanigawa, Y., Altman, R. B., Dunker, A. K., Hunter, L., Ritchie, M. D., Murray, T., Klein, T. E. WORLD SCIENTIFIC PUBL CO PTE LTD. 2019: 184–95
  • Proficiency Testing of Standardized Samples Shows Very High Interlaboratory Agreement for Clinical Next-Generation Sequencing-Based Oncology Assays. Archives of pathology & laboratory medicine Merker, J. D., Devereaux, K., Iafrate, A. J., Kamel-Reid, S., Kim, A. S., Moncur, J. T., Montgomery, S. B., Nagarajan, R., Portier, B. P., Routbort, M. J., Smail, C., Surrey, L. F., Vasalos, P., Lazar, A. J., Lindeman, N. I. 2018

    Abstract

    CONTEXT.: Next-generation sequencing-based assays are being increasingly used in the clinical setting for the detection of somatic variants in solid tumors, but limited data are available regarding the interlaboratory performance of these assays.OBJECTIVE.: To examine proficiency testing data from the initial College of American Pathologists (CAP) Next-Generation Sequencing Solid Tumor survey to report on laboratory performance.DESIGN.: CAP proficiency testing results from 111 laboratories were analyzed for accuracy and associated assay performance characteristics.RESULTS.: The overall accuracy observed for all variants was 98.3%. Rare false-negative results could not be attributed to sequencing platform, selection method, or other assay characteristics. The median and average of the variant allele fractions reported by the laboratories were within 10% of those orthogonally determined by digital polymerase chain reaction for each variant. The median coverage reported at the variant sites ranged from 1922 to 3297.CONCLUSIONS.: Laboratories demonstrated an overall accuracy of greater than 98% with high specificity when examining 10 clinically relevant somatic single-nucleotide variants with a variant allele fraction of 15% or greater. These initial data suggest excellent performance, but further ongoing studies are needed to evaluate the performance of lower variant allele fractions and additional variant types.

    View details for PubMedID 30376374