My research interests primarily lie in two parts: 1) understanding genetic architecture of complex diseases and traits, and 2) clinical implementation of human genetics discoveries, for example, pharmacogenomics. I received my Ph.D. degree in Genomics and Computational Biology from University of Pennsylvania. My dissertation focused on identifying complex trait or disease-associated genes via genomic regulation-informed gene-based analyses. I am now a postdoctoral fellow in the Klein Lab (PharmGKB group). I am currently working on the Pharmacogenomics Clinical Annotation Tool (PharmCAT), a one-stop bioinformatics tool that analyzes pharmacogenomics variants from genotypic datasets and generates reports with genotype-based prescribing recommendations to supports clinical pharmacogenomics implementations and treatment decisions.

Professional Education

  • Bachelor of Science, Fudan University (2015)
  • Doctor of Philosophy, University of Pennsylvania (2020)
  • PhD, University of Pennsylvania, Genomics and Computational Biology (2020)
  • BS, Fudan University, Life Sciences (2015)

Stanford Advisors


  • Pharmacogenomics Clinical Annotation Tool (PharmCAT)


    Stanford, CA, USA

Lab Affiliations

All Publications

  • How to Run the Pharmacogenomics Clinical Annotation Tool (PharmCAT). Clinical pharmacology and therapeutics Li, B., Sangkuhl, K., Keat, K., Whaley, R. M., Woon, M., Verma, S., Dudek, S., Tuteja, S., Verma, A., Whirl-Carrillo, M., Ritchie, M. D., Klein, T. E. 2022


    Pharmacogenomics (PGx) investigates the genetic influence on drug response and is an integral part of precision medicine. While PGx testing is becoming more common in clinical practice and may be reimbursed by Medicare/Medicaid and commercial insurance, interpreting PGx testing results for clinical decision support is still a challenge. The Pharmacogenomics Clinical Annotation Tool (PharmCAT) has been designed to tackle the need for transparent, automatic interpretations of patient genetic data. PharmCAT incorporates a patient's genotypes, annotates PGx information (allele, genotype, and phenotype), and generates a report with PGx guideline recommendations from the Clinical Pharmacogenetics Implementation Consortium (CPIC) and/or the Dutch Pharmacogenetics Working Group (DPWG). PharmCAT has introduced new features in the last two years, including a VCF preprocessor, the inclusion of DPWG guidelines, and functionalities for PGx research. For example, researchers can use the VCF preprocessor to prepare biobank-scale data for PharmCAT. In addition, PharmCAT enables the assessment of novel partial and combination alleles that are composed of known PGx variants and can call CYP2D6 genotypes based on SNPs and INDELs in the input VCF file. This tutorial provides materials and detailed step-by-step instructions for how to use PharmCAT in a versatile way that can be tailored to users' individual needs.

    View details for DOI 10.1002/cpt.2790

    View details for PubMedID 36350094

  • An Investigation of the Knowledge Overlap between Pharmacogenomics and Disease Genetics. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Li, B., Whirl-Carrillo, M., Wright, M. W., Babb, L., Rehm, H. L., Klein, T. E. 2022; 27: 385-396


    Precision medicine faces many challenges, including the gap of knowledge between disease genetics and pharmacogenomics (PGx). Disease genetics interprets the pathogenicity of genetic variants for diagnostic purposes, while PGx investigates the genetic influences on drug responses. Ideally, the quality of health care would be improved from the point of disease diagnosis to drug prescribing if PGx is integrated with disease genetics in clinical care. However, PGx genes or variants are usually not reported as a secondary finding even if they are included in a clinical genetic test for diagnostic purposes. This happens even though the detection of PGx variants can provide valuable drug prescribing recommendations. One underlying reason is the lack of systematic classification of the knowledge overlap between PGx and disease genetics. Here, we address this issue by analyzing gene and genetic variant annotations from multiple expert-curated knowledge databases, including PharmGKB, CPIC, ClinGen and ClinVar. We further classified genes based on the strength of evidence supporting a gene's pathogenic role or PGx effect as well as the level of clinical actionability of a gene. Twenty-six genes were found to have pathogenic variation associated with germline diseases as well as strong evidence for a PGx association. These genes were classified into four sub-categories based on the distinct connection between the gene's pathogenic role and PGx effect. Moreover, we have also found thirteen RYR1 genetic variants that were annotated as pathogenic and at the same time whose PGx effect was supported by a preponderance of evidence and given drug prescribing recommendations. Overall, we identified a nontrivial number of gene and genetic variant overlaps between disease genetics and PGx, which laid out a foundation for combining PGx and disease genetics to improve clinical care from disease diagnoses to drug prescribing and adherence.

    View details for PubMedID 34890165

  • Mapping the human genetic architecture of COVID-19. Nature COVID-19 Host Genetics Initiative 2021


    The genetic makeup of an individual contributes to susceptibility and response to viral infection. While environmental, clinical and social factors play a role in exposure to SARS-CoV-2 and COVID-19 disease severity1,2, host genetics may also be important. Identifying host-specific genetic factors may reveal biological mechanisms of therapeutic relevance and clarify causal relationships of modifiable environmental risk factors for SARS-CoV-2 infection and outcomes. We formed a global network of researchers to investigate the role of human genetics in SARS-CoV-2 infection and COVID-19 severity. We describe the results of three genome-wide association meta-analyses comprised of up to 49,562 COVID-19 patients from 46 studies across 19 countries. We reported 13 genome-wide significant loci that are associated with SARS-CoV-2 infection or severe manifestations of COVID-19. Several of these loci correspond to previously documented associations to lung or autoimmune and inflammatory diseases3-7. They also represent potentially actionable mechanisms in response to infection. Mendelian Randomization analyses support a causal role for smoking and body mass index for severe COVID-19 although not for type II diabetes. The identification of novel host genetic factors associated with COVID-19, with unprecedented speed, was made possible by the community of human genetic researchers coming together to prioritize sharing of data, results, resources and analytical frameworks. This working model of international collaboration underscores what is possible for future genetic discoveries in emerging pandemics, or indeed for any complex human disease.

    View details for DOI 10.1038/s41586-021-03767-x

    View details for PubMedID 34237774

  • Tissue specificity-aware TWAS (TSA-TWAS) framework identifies novel associations with metabolic, immunologic, and virologic traits in HIV-positive adults. PLoS genetics Li, B., Veturi, Y., Verma, A., Bradford, Y., Daar, E. S., Gulick, R. M., Riddler, S. A., Robbins, G. K., Lennox, J. L., Haas, D. W., Ritchie, M. D. 2021; 17 (4): e1009464


    As a type of relatively new methodology, the transcriptome-wide association study (TWAS) has gained interest due to capacity for gene-level association testing. However, the development of TWAS has outpaced statistical evaluation of TWAS gene prioritization performance. Current TWAS methods vary in underlying biological assumptions about tissue specificity of transcriptional regulatory mechanisms. In a previous study from our group, this may have affected whether TWAS methods better identified associations in single tissues versus multiple tissues. We therefore designed simulation analyses to examine how the interplay between particular TWAS methods and tissue specificity of gene expression affects power and type I error rates for gene prioritization. We found that cross-tissue identification of expression quantitative trait loci (eQTLs) improved TWAS power. Single-tissue TWAS (i.e., PrediXcan) had robust power to identify genes expressed in single tissues, but, often found significant associations in the wrong tissues as well (therefore had high false positive rates). Cross-tissue TWAS (i.e., UTMOST) had overall equal or greater power and controlled type I error rates for genes expressed in multiple tissues. Based on these simulation results, we applied a tissue specificity-aware TWAS (TSA-TWAS) analytic framework to look for gene-based associations with pre-treatment laboratory values from AIDS Clinical Trial Group (ACTG) studies. We replicated several proof-of-concept transcriptionally regulated gene-trait associations, including UGT1A1 (encoding bilirubin uridine diphosphate glucuronosyltransferase enzyme) and total bilirubin levels (p = 3.59*10-12), and CETP (cholesteryl ester transfer protein) with high-density lipoprotein cholesterol (p = 4.49*10-12). We also identified several novel genes associated with metabolic and virologic traits, as well as pleiotropic genes that linked plasma viral load, absolute basophil count, and/or triglyceride levels. By highlighting the advantages of different TWAS methods, our simulation study promotes a tissue specificity-aware TWAS analytic framework that revealed novel aspects of HIV-related traits.

    View details for DOI 10.1371/journal.pgen.1009464

    View details for PubMedID 33901188

  • Genome-first approach to rare EYA4 variants and cardio-auditory phenotypes in adults. Human genetics Ahmadmehrabi, S., Li, B., Park, J., Devkota, B., Vujkovic, M., Ko, Y., Van Wagoner, D., Tang, W. H., Krantz, I., Ritchie, M., Regeneron Genetics Center, Brant, J., Ruckenstein, M. J., Epstein, D. J., Rader, D. J. 2021


    While newborns and children with hearing loss are routinely offered genetic testing, adults are rarely clinically tested for a genetic etiology. One clinically actionable result from genetic testing in children is the discovery of variants in syndromic hearing loss genes. EYA4 is a known hearing loss gene which is also involved in important pathways in cardiac tissue. The pleiotropic effects of rare EYA4 variants are poorly understood and their prevalence in a large cohort has not been previously reported. We investigated cardio-auditory phenotypes in 11,451 individuals in a large biobank using a rare variant, genome-first approach to EYA4. We filtered 256 EYA4 variants carried by 6737 participants to 26 rare and predicted deleterious variants carried by 42 heterozygotes. We aggregated predicted deleterious EYA4 gene variants into a combined variable (i.e. "gene burden") and performed association studies across phenotypes compared to wildtype controls. We validated findings with replication in three independent cohorts and human tissue expression data. EYA4 gene burden was significantly associated with audiometric-proven HL (p=[Formula: see text], Mobitz Type II AV block (p=[Formula: see text]) and the syndromic presentation of HL and primary cardiomyopathy (p=0.0194). Analyses on audiogram, echocardiogram, and electrocardiogram data validated these associations. Prior reports have focused on identifying variants in families with severe or syndromic phenotypes. In contrast, we found, using a genotype-first approach, that gene burden in EYA4 is associated with more subtle cardio-auditory phenotypes in an adult medical biobank population, including cardiac conduction disorders which have not been previously reported. We show the value of using a focused approach to uncover human disease related to pleiotropic gene variants and suggest a role for genetic testing in adults presenting with hearing loss.

    View details for DOI 10.1007/s00439-021-02263-6

    View details for PubMedID 33745059

  • From GWAS to Gene: Transcriptome-Wide Association Studies and Other Methods to Functionally Understand GWAS Discoveries. Frontiers in genetics Li, B., Ritchie, M. D. 2021; 12: 713230


    Since their inception, genome-wide association studies (GWAS) have identified more than a hundred thousand single nucleotide polymorphism (SNP) loci that are associated with various complex human diseases or traits. The majority of GWAS discoveries are located in non-coding regions of the human genome and have unknown functions. The valley between non-coding GWAS discoveries and downstream affected genes hinders the investigation of complex disease mechanism and the utilization of human genetics for the improvement of clinical care. Meanwhile, advances in high-throughput sequencing technologies reveal important genomic regulatory roles that non-coding regions play in the transcriptional activities of genes. In this review, we focus on data integrative bioinformatics methods that combine GWAS with functional genomics knowledge to identify genetically regulated genes. We categorize and describe two types of data integrative methods. First, we describe fine-mapping methods. Fine-mapping is an exploratory approach that calibrates likely causal variants underneath GWAS signals. Fine-mapping methods connect GWAS signals to potentially causal genes through statistical methods and/or functional annotations. Second, we discuss gene-prioritization methods. These are hypothesis generating approaches that evaluate whether genetic variants regulate genes via certain genetic regulatory mechanisms to influence complex traits, including colocalization, mendelian randomization, and the transcriptome-wide association study (TWAS). TWAS is a gene-based association approach that investigates associations between genetically regulated gene expression and complex diseases or traits. TWAS has gained popularity over the years due to its ability to reduce multiple testing burden in comparison to other variant-based analytic approaches. Multiple types of TWAS methods have been developed with varied methodological designs and biological hypotheses over the past 5 years. We dive into discussions of how TWAS methods differ in many aspects and the challenges that different TWAS methods face. Overall, TWAS is a powerful tool for identifying complex trait-associated genes. With the advent of single-cell sequencing, chromosome conformation capture, gene editing technologies, and multiplexing reporter assays, we are expecting a more comprehensive understanding of genomic regulation and genetically regulated genes underlying complex human diseases and traits in the future.

    View details for DOI 10.3389/fgene.2021.713230

    View details for PubMedID 34659337

  • Evaluating the frequency and the impact of pharmacogenetic alleles in an ancestrally diverse Biobank population. Journal of translational medicine Verma, S. S., Keat, K., Li, B., Hoffecker, G., Risman, M., Sangkuhl, K., Whirl-Carrillo, M., Dudek, S., Verma, A., Klein, T. E., Ritchie, M. D., Tuteja, S. 2022; 20 (1): 550


    Pharmacogenomics (PGx) aims to utilize a patient's genetic data to enable safer and more effective prescribing of medications. The Clinical Pharmacogenetics Implementation Consortium (CPIC) provides guidelines with strong evidence for 24 genes that affect 72 medications. Despite strong evidence linking PGx alleles to drug response, there is a large gap in the implementation and return of actionable pharmacogenetic findings to patients in standard clinical practice. In this study, we evaluated opportunities for genetically guided medication prescribing in a diverse health system and determined the frequencies of actionable PGx alleles in an ancestrally diverse biobank population.A retrospective analysis of the Penn Medicine electronic health records (EHRs), which includes ~ 3.3 million patients between 2012 and 2020, provides a snapshot of the trends in prescriptions for drugs with genotype-based prescribing guidelines ('CPIC level A or B') in the Penn Medicine health system. The Penn Medicine BioBank (PMBB) consists of a diverse group of 43,359 participants whose EHRs are linked to genome-wide SNP array and whole exome sequencing (WES) data. We used the Pharmacogenomics Clinical Annotation Tool (PharmCAT), to annotate PGx alleles from PMBB variant call format (VCF) files and identify samples with actionable PGx alleles.We identified ~ 316.000 unique patients that were prescribed at least 2 drugs with CPIC Level A or B guidelines. Genetic analysis in PMBB identified that 98.9% of participants carry one or more PGx actionable alleles where treatment modification would be recommended. After linking the genetic data with prescription data from the EHR, 14.2% of participants (n = 6157) were prescribed medications that could be impacted by their genotype (as indicated by their PharmCAT report). For example, 856 participants received clopidogrel who carried CYP2C19 reduced function alleles, placing them at increased risk for major adverse cardiovascular events. When we stratified by genetic ancestry, we found disparities in PGx allele frequencies and clinical burden. Clopidogrel users of Asian ancestry in PMBB had significantly higher rates of CYP2C19 actionable alleles than European ancestry users of clopidrogrel (p < 0.0001, OR = 3.68).Clinically actionable PGx alleles are highly prevalent in our health system and many patients were prescribed medications that could be affected by PGx alleles. These results illustrate the potential utility of preemptive genotyping for tailoring of medications and implementation of PGx into routine clinical care.

    View details for DOI 10.1186/s12967-022-03745-5

    View details for PubMedID 36443877

    View details for PubMedCentralID 3098762

  • A first update on mapping the human genetic architecture of COVID-19 NATURE Pathak, G. A., Polimanti, R., Karjalainen, J., Daly, M., Ganna, A., Daly, M. J., Stevens, C., Kanai, M., Liao, R. G., Trankiem, A., Balaconis, M. K., Nguyen, H., Solomonson, M., Veerapen, K., Ripatti, S., Nkambul, L., Bryant, S., Sankaran, V. G., Neale, B. M., Karczewski, K. J., Martin, A. R., Atkinson, E. G., Tsuo, K., Baya, N., Turley, P., Gupta, R., Walters, R. K., Palmer, D. S., Sarma, G., Cheng, N., Lu, W., Churchhouse, C., Goldstein, J., King, D., Zhou, W., Seed, C., Finucane, H., Satterstrom, F., Andrews, S. J., Sloofman, L. G., Sealfon, S. C., Hoggart, C., Underwood, S. J., Cordioli, M., Pirinen, M., Donner, K., Kivinen, K., Palotie, A., Kaunisto, M., Harerimana, N., Chwialkowska, K., Wolford, B., Roberts, G., Park, D., Ball, C. A., Coignet, M., McCurdy, S., Knight, S., Partha, R., Rhead, B., Zhang, M., Berkowitz, N., Gaddis, M., Noto, K., Ruiz, L., Pavlovic, M., Hong, E. L., Rand, K., Girshick, A., Guturu, H., Baltzell, A., Niemi, M. K., Pigazzini, S., Rahmouni, S., Georges, M., Belhaj, Y., Guntz, J., Claassen, S., Beguin, Y., Gofflot, S., Nkambule, L., Nkambul, L., Cusick, C., Moutschen, M., Misset, B., Darcis, G., Guiot, J., Azarzar, S., Malaise, O., Huynen, P., Meuris, C., Thys, M., Jacques, J., Leonard, P., Frippiat, F., Giot, J., Sauvage, A., Von Frenckell, C., Lambermont, B., Nakanishi, T., Morrison, D. R., Richards, J., Butler-Laporte, G., Forgetta, V., Ghosh, B., Laurent, L., Henry, D., Abdullah, T., Adeleye, O., Mamlouk, N., Kimchi, N., Afrasiabi, Z., Rezk, N., Vulesevic, B., Bouab, M., Guzman, C., Petitjean, L., Tselios, C., Xue, X., Afilalo, J., Adra, D., Mooser, V., Li, R., Belisle, A., Lepage, P., Ragoussis, J., Auld, D., Lathrop, G., Afilalo, M., Oliveira, M., Brenner, B., Brassard, N., Durand, M., Chasse, M., Kaufmann, D. E., Schurr, E., Hayward, C., Richmond, A., Baillie, J., Glessner, J. T., Hakonarson, H., Chang, X., Shaw, D. M., Below, J., Polikowski, H., Lauren, P. E., Chen, H., Zhu Wanying, Davis, L., Kerchberger, V., Campbell, A., Porteous, D. J., Fawns-Ritchie, C., Morris, M., McCormick, J. B., North, K., Glessner, J. R., Gignoux, C. R., Wicks, S. J., Crooks, K., Barnes, K. C., Daya, M., Shortt, J., Rafaels, N., Chavan, S., Timmers, P. J., Wilson, J. F., Tenesa, A., Kerr, S. M., D'Mellow, K., Shahin, D., El-Sherbiny, Y. M., El-Jawhari, J. J., von Hohenstaufen, K., Sobh, A., Eltoukhy, M. M., Mohamed, A. S., Elhadidy, T. A., Abd Elghafar, M. S., Elnagdy, M. H., Samir, A., Hegazy, M. F., Abdel-Aziz, M., Khafaga, W. T., El-Lawaty, W. M., Torky, M. S., Moahmed, H. S., El-shanshory, M. R., Yassen, A. M., Okasha, K., Eid, M. A., Medina-Gomez, C., Uitterlinden, A. G., Ikram, M., Magi, R., Milani, L., Metspalu, A., Laisk, T., Lall, K., Lepamets, M., Esko, T., Reimann, E., Alavere, H., Metsalu, K., Puusepp, M., Naaber, P., Laane, E., Pesukova, J., Peterson, P., Kisand, K., Tabri, J., Allos, R., Hensen, K., Starkopf, J., Ringmets, I., Tamm, A., Kallaste, A., Batini, C., Tobin, M. D., Venn, L. D., Lee, P. H., Shrine, N., Williams, A. T., Guyatt, A. L., John, C., Packer, R. J., Ali, A., Wang, X., Wain, L., Bee, C. E., Adams, E. L., Free, R. C., Hollox, E. J., Ruotsalainen, S., Kristiansson, K., Koskelainen, S., Perola, M., Rivolta, C., Quinodoz, M., Kamdar, D., Bochud, P., Boillat, N., Bibert, S., Nussle, S., Albrich, W., Suh, N., Neofytos, D., Erard, V., Voide, C., Friolet, R., Vollenweider, P., Pagani, J. L., Oddo, M., zu Bentrup, F., Conen, A., Clerc, O., Marchetti, O., Guillet, A., Guyat-Jacques, C., Foucras, S., Rime, M., Chassot, J., Jaquet, M., Viollet, R., Lannepoudenx, Y., Portopena, L., Bochud, P. Y., Desgranges, F., Filippidis, P., Guery, B., Haefliger, D., Kampouri, E. E., Manuel, O., Munting, A., Papadimitriou-Olivgeris, M., Regina, J., Rochat-Stettler, L., Suttels, Tadini, E., Tschopp, J., Van Singer, M., Viala, B., Boillat-Blanco, N., Brahier, T., Hugli, O., Meuwly, J. Y., Pantet, O., Nussle, S., Bochud, M., D'Acremont, Younes, S., Albrich, W. C., Suh, N., Cerny, A., O'Mahony, L., von Mering, C., Frischknecht, M., Kleger, G., Filipovic, M., Kahlert, C. R., Wozniak, H., Negro, T., Pugin, J., Bouras, K., Knapp, C., Egger, T., Perret, A., Montillier, P., di Bartolomeo, C., Barda, B., de Cid, R., Carreras, A., Galvan-Femenia, I., Blay, N., Farre, X., Sumoy, L., Cortes, B., Moreno, V., Kogevinas, M., Garcia-Aymerich, J., Castano-Vinyals, G., Dobano, C., Mercader, J., Mercader, J., Guindo-Martinez, M., Torrents, D., Gori, M., Picchiotti, N., Tanfoni, M., Renieri, A., Mari, F., Fallerini, C., Daga, S., Baldassarri, M., Fava, F., Frullanti, E., Valentino, F., Doddato, G., Giliberti, A., Bruttini, M., Croci, S., Meloni, I., Beligni, G., Di Sarno, L., Palmieri, M., Carriero, M., Alaverdian, D., Tita, R., Amitrano, S., Mencarelli, M., Lo Rizzo, C., Pinto, A., Montagnani, F., Tumbarello, M., Furini, S., Benetti, E., Zguro, K., Capitani, K., Bianchi, F., Lista, M., Mondelli, M., Bruno, R., Castelli, F., Quiros-Roldan, E., Degli Antoni, M., Vaghi, M., Rusconi, S., Riva, A., Siano, M., Gabrieli, A., Fabbiani, M., Rossetti, B., Rancan, I., Bargagli, E., Bergantini, L., D'Alessandro, M., Cameli, P., Bennett, D., Franchi, F., Anedda, F., Marcantonio, S., Scolletta, S., Mazzei, M., Guerrini, S., Cantarini, L., Conticini, E., Frediani, B., Tacconi, D., Spertilli, C., Feri, M., Donati, A., Scala, R., Guidelli, L., Spargi, G., Corridi, M., Nencioni, C., Croci, L., Bandini, M., Piacentini, P., Desanctis, E., Cappelli, S., Caldarelli, G., Canaccini, A., Verzuri, A., Anemoli, V., Ognibene, A., Pancrazzi, A., Lorubbio, M., Monforte, A., Miraglia, F., Girardis, M., Busani, S., Venturelli, S., Antinori, A., Emiliozzi, A., Vergori, A., Francisci, D., Schiaroli, E., Tommasi, A., Paciosi, F., Scotton, P., Andretta, F., Panese, S., Scaggiante, R., Gatti, F., Della Monica, M., Piscopo, C., Capasso, M., Russo, R., Andolfo, I., Iolascon, A., Merla, G., Fiorentino, G., Castori, M., Carella, M., Aucella, F., Di Biagio, A., Bassetti, M., Masucci, L., Sanguinetti, M., Guarnaccia, A., Valente, S., De Vivo, O., Mandala, M., Giorli, A., Salerni, L., Zucchi, P., Parravicini, P., Giannattasio, F., Trotta, T., Coiro, G., Coviello, D. A., Mussini, C., Tavecchia, L., Belli, M., Mancarella, S., Crotti, L., Parati, G., Rizzi, M., Maggiolo, F., Ripamonti, D., La Rovere, M., Sarzi-Braga, S., Bussotti, M., Ravaglia, S., Artuso, R., Andreucci, E., Perrella, A., Romani, D., Bergomi, P., Catena, E., Colombo, R., Vincenti, A., Ferri, C., Grassi, D., Pessina, G., Poscente, M., Di Pietro, M., Sabrina, R., Luchi, S., Dei, S., Sanarico, M., Gabbi, C., Ceri, S., Pinoli, P., Raimondi, F., Biscarini, F., Stella, A., Vecchia, M., Mantovani, S., Ludovisi, S., Zanella, I., Cossarizza, A., Parisi, S., Baratti, S., Squeo, G., Raggi, P., Marciano, C., Perna, R., Menatti, E., Lena, F., Martinelli, E., Bachetti, T., Suardi, C., Botta, G., Di Domenico, P., Barbieri, C., Tiseo, G., Falcone, M., Acquilini, D., Segala, F., Petrocelli, P., Baroni, S., van Heel, D. A., Hunt, K. A., van Heel, D., Trembath, R. C., Huang, Q., Martin, H. C., Mason, D., Wright, J., Trivedi, B., Finer, S., Akhtar, S., Anwar, M., Arciero, E., Ashraf, S., Breen, G., Chung, R., Curtis, C. J., Chowdhury, M., Colligan, G., Deloukas, P., Durham, C., Griffiths, C., Hurles, M., Hussain, S., Islam, K., Khan, A., Khan, A., Lavery, C., Lee, S., Lerner, R., MacArthur, D., MacLaughlin, B., Martin, H., Miah, S., Newman, B., Safa, N., Tahmasebi, F., Griffiths, C. J., Smith, A., Boughton, A. P., Li, K. W., LeFaive, J., Annis, A., Zollner, S., Wang, J., Beck, A., Niavarani, A., Sharififard, B., Aliannejad, R., Naderpour, Z., Amirsavadkouhi, A., Tadi, H., Aleagha, A., Ahmadi, S., Moghaddam, S., Adamsara, A., Saeedi, M., Abdollahi, H., Hosseini, A., Chariyavilaskul, P., Jantarabenjakul, W., Putchareon, O., Torvorapanit, P., Puthanakit, T., Hirankarn, N., Sodsai, P., Chamnanphon, M., Suttichet, T. B., Shotelersuk, V., Phokaew, C., Chetruengchai, W., Pongpanich, M., Suchartlikitwong, P., Nilaratanakul, V., Brumpton, B. M., Hveem, K., Asvold, B., Willer, C., Rogne, T., Solligard, E., Franke, L., Claringbould, A., Lopera, E., Warmerdam, R., van Blokland, I., Boezen, M., Deelen, P., Vonk, J. M., Lanting, P., Ori, A. S., Feng, Y., Weiss, S. T., Karlson, E. W., Woolley, A. E., Smoller, J. W., Murphy, S. N., Meigs, J. B., Green, R. C., Perez, E. F., Ascolillo, S., Thompson, R. C., Beckmann, N. D., Sebra, R. P., Gettler, K., Salib, I., Zyndorf, M., Schadt, E. E., Collins, B. L., Levy, T., Buxbaum, J. D., Britvan, B., Keller, K., Tang, L., Peruggia, M., Hiester, L. L., Niblo, K., Aksentijevich, A., Labkowsky, A., Karp, A., Zlatopolsky, M., Jordan, D. M., Chaudhary, K., Cho, J. H., Itan, Y., Do, R., Nadkarni, G. N., Preuss, M., Loos, R. F., Belbin, G. M., Abul-Husn, N. S., Kenny, E. E., Choi, S., O'Reilly, P., Charney, A. W., Huckins, L. M., Ferreira, M. R., Abecasis, G. R., Cantor, M. N., Kosmicki, J. A., Horowitz, J. E., Baras, A., Yadav, A., Leader, J. B., Gass, M. C., Justice, A. E., Chittoor, G., Josyula, N., Carey, D. J., Mirshahi, T., Hottenga, J., Bartels, M., de Geus, E. C., Nivard, M. G., Verma, A., Ritchie, M. D., Rader, D., Verma, S. S., Lucas, A., Bradford, Y., Li, B., Abedalthagafi, M., Al Harthi, F., Alsolm, E., Abu Safieh, L., Alowayn, A. M., Alqubaishi, F., Al Mutairi, A., AlBardis, H., Alotaibi, S., Fawzy, M. S., Alaamery, M., Massadeh, S., Almutairi, M., Alshareef, A., Suliman, B. A., Sawaji, M., AlMalik, A., Alqahtani, S., Baraka, D., Hasanato, R., Mangul, S., Aljawini, N., Albesher, N., Alkwai, S., Alswailm, M., Almohammed, I., Arabi, Y. M., Mahmoud, E. S., Khattab, A. K., Halawani, R. T., Alahmadey, Z. Z., Albakri, J. K., Felemban, W. A., Al-Awdah, L., Alghamdi, J., AlZahrani, D., AlDhawi, N., Almalki, F., Albeladi, M., Albader, A., AlJohani, S., Al-Afghani, H., Barhoush, E., Alghamdi, B., Jung, J., Alrashed, M., Zeberg, H., Maricic, T., Frithiof, R., Hultstrom, M., Lipcsey, M., Tardif, N., Rooyackers, O., Grip, J., Helgeland, O., Harris, J. R., Magnus, P., Lee, Y., Trogstad, L. S., Mangino, M., Spector, T. D., Emma, D., Moutsianas, L., Caulfield, M. J., Scott, R. H., Rendon, A., Kousathanas, A., Pasko, D., Walker, S., Stuckey, A., Odhams, C. A., Rhodes, D., Fowler, T., Chan, G., Arumugam, P., Wilson, D. J., Earle, S. G., Lin, S., Arning, N., Armstrong, J., Rudkin, J. K., Spencer, C. A., Koelling, N., Crook, D. W., Wyllie, D. H., O'Connell, A., Band, G., Callier, S., Soranzo, N., Zhao, J., Danesh, J., Di Angelantonio, E., Butterworth, A. S., Sun, Y., Huffman, J. E., O'Donnell, C. J., Peloso, G., Cho, K., Gaziano, J., Ho, Y., Tsao, P., Priest, J., Smieszek, S. P., Polymeropoulos, C., Polymeropoulos, V., Polymeropoulos, M. H., Przychodzen, B. P., Fernandez-Cadenas, I., Llucia-Carol, L., Cullell, N., Muino, E., Carcel-Marquez, J., Planas, A. M., Perez-Tur, J., DeDiego, M. L., Iglesias, L., Soriano, A., Rico, V., Aguero, D., Bedini, J. L., Domingo, C., Robles, V., Lozano, F., Ruiz-Jaen, F., Marquez, L., Gomez, J., Coto, E., Albaiceta, G. M., Garcia-Clemente, M., Dalmau, D., Arranz, M. J., Dietl, B., Serra-Llovich, A., Soler, P., Colobran, R., Martin-Nalda, A., Martinez, A., Bernardo, D., Fiz-Lopez, A., Arribas, E., De La Cal-Sabater, P., Rojo, S., Segura, T., Gonzalez-Villa, E., Serrano-Heras, G., Marti-Fabregas, J., Jimenez-Xarrie, E., Mimbrera, A., Masjuan, J., Garcia-Madrona, S., Dominguez-Mayoral, A., Villalonga, J., Menendez-Valladares, P., Chasman, D., Sesso, H. D., Manson, J. E., Buring, J. E., Ridker, P. M., Franco, G., Lee, S., Biesecker, L., COVID-19 Host Genetics Initiative 2022: E1-E10

    View details for DOI 10.1038/s41586-022-04826-7

    View details for Web of Science ID 000835655400013

    View details for PubMedID 35922517

    View details for PubMedCentralID PMC9352569

  • Whole-genome sequencing reveals host factors underlying critical COVID-19 NATURE Kousathanas, A., Pairo-Castineira, E., Rawlik, K., Stuckey, A., Odhams, C. A., Walker, S., Russell, C. D., Malinauskas, T., Wu, Y., Millar, J., Shen, X., Elliott, K. S., Griffiths, F., Oosthuyzen, W., Morrice, K., Keating, S., Wang, B., Rhodes, D., Klaric, L., Zechner, M., Parkinson, N., Siddiq, A., Goddard, P., Donovan, S., Maslove, D., Nichol, A., Semple, M. G., Zainy, T., Maleady-Crowe, F., Todd, L., Salehi, S., Knight, J., Elgar, G., Chan, G., Arumugam, P., Patch, C., Rendon, A., Bentley, D., Kingsley, C., Kosmicki, J. A., Horowitz, J. E., Baras, A., Abecasis, G. R., Ferreira, M. R., Justice, A., Mirshahi, T., Oetjens, M., Rader, D. J., Ritchie, M. D., Verma, A., Fowler, T. A., Shankar-Hari, M., Summers, C., Hinds, C., Horby, P., Ling, L., McAuley, D., Montgomery, H., Openshaw, P. M., Elliott, P., Walsh, T., Tenesa, A., Fawkes, A., Murphy, L., Rowan, K., Ponting, C. P., Vitart, V., Wilson, J. F., Yang, J., Bretherick, A. D., Scott, R. H., Hendry, S., Moutsianas, L., Law, A., Caulfield, M. J., Baillie, J., GenOMICC Investigators, 23andMe Investigators, COVID-19 Human Genetics Initiative 2022


    Critical COVID-19 is caused by immune-mediated inflammatory lung injury. Host genetic variation influences the development of illness requiring critical care1 or hospitalization2-4 after infection with SARS-CoV-2. The GenOMICC (Genetics of Mortality in Critical Care) study enables the comparison of genomes from individuals who are critically ill with those of population controls to find underlying disease mechanisms. Here we use whole-genome sequencing in 7,491 critically ill individuals compared with 48,400 controls to discover and replicate 23 independent variants that significantly predispose to critical COVID-19. We identify 16 new independent associations, including variants within genes that are involved in interferon signalling (IL10RB and PLSCR1), leucocyte differentiation (BCL11A) and blood-type antigen secretor status (FUT2). Using transcriptome-wide association and colocalization to infer the effect of gene expression on disease severity, we find evidence that implicates multiple genes-including reduced expression of a membrane flippase (ATP11A), and increased expression of a mucin (MUC1)-in critical disease. Mendelian randomization provides evidence in support of causal roles for myeloid cell adhesion molecules (SELE, ICAM5 and CD209) and the coagulation factor F8, all of which are potentially druggable targets. Our results are broadly consistent with a multi-component model of COVID-19 pathophysiology, in which at least two distinct mechanisms can predispose to life-threatening disease: failure to control viral replication; or an enhanced tendency towards pulmonary inflammation and intravascular coagulation. We show that comparison between cases of critical illness and population controls is highly efficient for the detection of therapeutically relevant mechanisms of disease.

    View details for DOI 10.1038/s41586-022-04576-6

    View details for Web of Science ID 000812472400001

    View details for PubMedID 35255492

  • A Genome-First Approach to Rare Variants in Dominant Postlingual Hearing Loss Genes in a Large Adult Population. Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery Ahmadmehrabi, S., Li, B., Hui, D., Park, J., Ritchie, M., Rader, D. J., Ruckenstein, M. J., Epstein, D. J., Brant, J. 2021: 1945998211029544


    OBJECTIVE: To investigate the importance of rare variants in adult-onset hearing loss.STUDY DESIGN: Genomic association study.SETTING: Large biobank from tertiary care center.METHODS: We investigated rare variants (minor allele frequency <5%) in 42 autosomal dominant (DFNA) postlingual hearing loss (HL) genes in 16,657 unselected individuals in the Penn Medicine Biobank. We determined the prevalence of known pathogenic and predicted deleterious variants in subjects with audiometric-proven sensorineural hearing loss. We scanned across known postlingual DFNA HL genes to determine those most significantly contributing to the phenotype. We replicated findings in an independent cohort (UK Biobank).RESULTS: While rare individually, when considering the accumulation of variants in all postlingual DFNA genes, more than 90% of participants carried at least 1 rare variant. Rare variants predicted to be deleterious were enriched in adults with audiometric-proven hearing loss (pure-tone average >25 dB; P = .015). Patients with a rare predicted deleterious variant had an odds ratio of 1.27 for HL compared with genotypic controls (P = .029). Gene burden in DIABLO, PTPRQ, TJP2, and POU4F3 were independently associated with sensorineural hearing loss.CONCLUSION: Although prior reports have focused on common variants, we find that rare predicted deleterious variants in DFNA postlingual HL genes are enriched in patients with adult-onset HL in a large health care system population. We show the value of investigating rare variants to uncover hearing loss phenotypes related to implicated genes.

    View details for DOI 10.1177/01945998211029544

    View details for PubMedID 34281439

  • How Does the "Cookie-Bite" Audiogram Shape Perform in Discriminating Genetic Hearing Loss in Adults? Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery Ahmadmehrabi, S., Li, B., Epstein, D. J., Ruckenstein, M. J., Brant, J. A. 2021: 1945998211015181


    "Cookie-bite" or U-shaped audiograms-specifically, those showing midfrequency sensorineural hearing loss (HL)-are traditionally taught to be associated with genetic HL; however, their utility as a screening tool has not been reported. We aim to determine the performance of a cookie-bite audiogram shape in stratifying patients carrying putative loss-of-function variants in known HL genes from wild-type controls. We merged audiometric and exome sequencing data from adults enrolled in a large biobank at a tertiary care center. Of 321 patients, 50 carried a putative loss-of-function variant in an HL gene. The cookie-bite shape was present in 9 of those patients, resulting in low sensitivity (18%) and positive predictive value (15%) in stratifying genetic carrier status; 84% of patients with a cookie-bite audiogram did not carry a genetic variant. A cookie-bite audiogram should not be used to screen adults for possible genetic testing.

    View details for DOI 10.1177/01945998211015181

    View details for PubMedID 34058916

  • Influence of tissue context on gene prioritization for predicted transcriptome-wide association studies. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Li, B. n., Veturi, Y. n., Bradford, Y. n., Verma, S. S., Verma, A. n., Lucas, A. M., Haas, D. W., Ritchie, M. D. 2019; 24: 296–307


    Transcriptome-wide association studies (TWAS) have recently gained great attention due to their ability to prioritize complex trait-associated genes and promote potential therapeutics development for complex human diseases. TWAS integrates genotypic data with expression quantitative trait loci (eQTLs) to predict genetically regulated gene expression components and associates predictions with a trait of interest. As such, TWAS can prioritize genes whose differential expressions contribute to the trait of interest and provide mechanistic explanation of complex trait(s). Tissue-specific eQTL information grants TWAS the ability to perform association analysis on tissues whose gene expression profiles are otherwise hard to obtain, such as liver and heart. However, as eQTLs are tissue context-dependent, whether and how the tissue-specificity of eQTLs influences TWAS gene prioritization has not been fully investigated. In this study, we addressed this question by adopting two distinct TWAS methods, PrediXcan and UTMOST, which assume single tissue and integrative tissue effects of eQTLs, respectively. Thirty-eight baseline laboratory traits in 4,360 antiretroviral treatment-naïve individuals from the AIDS Clinical Trials Group (ACTG) studies comprised the input dataset for TWAS. We performed TWAS in a tissue-specific manner and obtained a total of 430 significant gene-trait associations (q-value < 0.05) across multiple tissues. Single tissue-based analysis by PrediXcan contributed 116 of the 430 associations including 64 unique gene-trait pairs in 28 tissues. Integrative tissue-based analysis by UTMOST found the other 314 significant associations that include 50 unique gene-trait pairs across all 44 tissues. Both analyses were able to replicate some associations identified in past variant-based genome-wide association studies (GWAS), such as high-density lipoprotein (HDL) and CETP (PrediXcan, q-value = 3.2e-16). Both analyses also identified novel associations. Moreover, single tissue-based and integrative tissuebased analysis shared 11 of 103 unique gene-trait pairs, for example, PSRC1-low-density lipoprotein (PrediXcan's lowest q-value = 8.5e-06; UTMOST's lowest q-value = 1.8e-05). This study suggests that single tissue-based analysis may have performed better at discovering gene-trait associations when combining results from all tissues. Integrative tissue-based analysis was better at prioritizing genes in multiple tissues and in trait-related tissue. Additional exploration is needed to confirm this conclusion. Finally, although single tissue-based and integrative tissue-based analysis shared significant novel discoveries, tissue context-dependency of eQTLs impacted TWAS gene prioritization. This study provides preliminary data to support continued work on tissue contextdependency of eQTL studies and TWAS.

    View details for PubMedID 30864331

    View details for PubMedCentralID PMC6417797

  • Collective feature selection to identify crucial epistatic variants BIODATA MINING Verma, S. S., Lucas, A., Zhang, X., Veturi, Y., Dudek, S., Li, B., Li, R., Urbanowicz, R., Moore, J. H., Kim, D., Ritchie, M. D. 2018; 11: 5


    Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis. Many methods have been evaluated and proposed to perform feature selection, but no single method works best in all scenarios. We demonstrate this by conducting two separate simulation analyses to evaluate the proposed collective feature selection approach.Through our simulation study we propose a collective feature selection approach to select features that are in the "union" of the best performing methods. We explored various parametric, non-parametric, and data mining approaches to perform feature selection. We choose our top performing methods to select the union of the resulting variables based on a user-defined percentage of variants selected from each method to take to downstream analysis. Our simulation analysis shows that non-parametric data mining approaches, such as MDR, may work best under one simulation criteria for the high effect size (penetrance) datasets, while non-parametric methods designed for feature selection, such as Ranger and Gradient boosting, work best under other simulation criteria. Thus, using a collective approach proves to be more beneficial for selecting variables with epistatic effects also in low effect size datasets and different genetic architectures. Following this, we applied our proposed collective feature selection approach to select the top 1% of variables to identify potential interacting variables associated with Body Mass Index (BMI) in ~ 44,000 samples obtained from Geisinger's MyCode Community Health Initiative (on behalf of DiscovEHR collaboration).In this study, we were able to show that selecting variables using a collective feature selection approach could help in selecting true positive epistatic variables more frequently than applying any single method for feature selection via simulation studies. We were able to demonstrate the effectiveness of collective feature selection along with a comparison of many methods in our simulation analysis. We also applied our method to identify non-linear networks associated with obesity.

    View details for DOI 10.1186/s13040-018-0168-6

    View details for Web of Science ID 000430966900001

    View details for PubMedID 29713383

    View details for PubMedCentralID PMC5907720