Bio


I am currently a Postdoctoral Fellow with Dr. Thomas Quertermous at Stanford University. I have joined the lab with more than 7 years of research experience in the field of computational biology wherein I have worked with multi-omics data for multiple diseases to get a deeper understanding of the disease identification and progression.
My background in engineering and bioinformatics provide an excellent background for the studies proposed in this application, which proposes to investigate the genetics and genomics of smooth muscle cell biology in the context of vascular disease. I first pursued a Bachelor's in Biotechnology program at one of the premier institutes in India, Banasthali Vidyapeeth and received my degree in 2007. After qualifying with the IIT-JAM exam in 2010, I joined the Master’s in Science (Biotechnology) program at the prestigious Indian Institute of Technology Roorkee in a program of engineering and technology. After my Master's, I joined Dr. Vinod Scaria’s lab at CSIR-IGIB as a Project Fellow. During the tenure as Project fellow from 2012-2014, I had the opportunity to work with different transcriptomics data from model organisms including zebrafish, rat and human cell lines to understand the role of long non-coding RNAs and miRNAs. I also worked on clinical datasets of autoimmune disorders. With one and half years of research experience and a UGC fellowship awarded through the NET-JRF examination, I continued working with Dr. Vinod Scaria to pursue my PhD. My research interest for the degree focused on the identification and characterization of circular RNAs, and this work has now been published in multiple manuscripts listed below. Over the years at CSIR-IGIB, I have had the chance to work on interesting ideas with multiple collaborating groups. One of them was Dr. Sridhar Sivasubbu, with whom I worked to understand the transcript-level interactions between mitochondria and the nucleus, using zebrafish as a model organism.
In view of my interest in the translational aspects of biology, I obtained the opportunity to work as part of the GUaRDIAN Consortium with Dr. Vinod Scaria and Dr. Sridhar Sivasubbu at CSIR-IGIB. This pioneering project is the largest network of researchers and clinicians in India pursuing sequencing patient DNAs to identify rare SNVs and structural variants responsible for muscular dystrophy in these patients. In the interest of advancing genomics in clinical and healthcare settings, I was selected as Intel Fellow 2019 to work for the Intel-IGIB collaboration focussing on “Accelerating Clinical Analysis and Interpretation of Genomic Data through advanced tools/libraries”. Our project was selected among top 3 from 50 premier research institutes and I was awarded the Intel-India Fellowship for a year to pursue this project. I was also part of the core team of IndiGen (Genomes for Public Health in India). With the spread of COVID-19 around the world, our group contributed by sequencing and analysing COVID19 genomes to get a better understanding of the disease and I had the opportunity to be part of the core team to analyse the viral sequencing datasets and viral assembly.
I am extremely pleased to have joined the Quertermous lab at Stanford to the study of the molecular mechanisms of cardiovascular disease. Work that I am pursuing in this laboratory, and proposed in this application, are directly in line with my personal aspiration to start an independent career in the field of scientific research to work on projects with high translational value and of interest to the public health.

Honors & Awards


  • Postdoctoral Fellow, Stanford University (1-9-2020 to present)
  • Intel-India Fellow, CSIR-Institute of Genomics and Integrative Biology and Intel pvt ltd. (2019-2020)
  • Senior Research Fellow, CSIR-Institute of Genomics and Integrative Biology (2016-2019)
  • Junior Research Fellow, CSIR-Institute of Genomics and Integrative Biology (2014-2016)
  • Project Fellow, CSIR-Institute of Genomics and Integrative Biology (2012-2014)

Stanford Advisors


All Publications


  • 1029 genomes of self-declared healthy individuals from India reveal prevalent and clinically relevant cardiac ion channelopathy variants. Human genomics Bajaj, A., Senthivel, V., Bhoyar, R., Jain, A., Imran, M., Rophina, M., Divakar, M. K., Jolly, B., Verma, A., Mishra, A., Sharma, D., Deepti, S., Sharma, G., Bansal, R., Yadav, R., Scaria, V., Naik, N., Sivasubbu, S. 2022; 16 (1): 30

    Abstract

    The prevalence and genetic spectrum of cardiac channelopathies exhibit population-specific differences. We aimed to understand the spectrum of cardiac channelopathy-associated variations in India, which is characterised by a genetically diverse population and is largely understudied in the context of these disorders.We utilised the IndiGenomes dataset comprising 1029 whole genomes from self-declared healthy individuals as a template to filter variants in 36 genes known to cause cardiac channelopathies. Our analysis revealed 186,782 variants, of which we filtered 470 variants that were identified as possibly pathogenic (440 nonsynonymous, 30 high-confidence predicted loss of function ). About 26% (124 out of 470) of these variants were unique to the Indian population as they were not reported in the global population datasets and published literature. Classification of 470 variants by ACMG/AMP guidelines unveiled 13 pathogenic/likely pathogenic (P/LP) variants mapping to 19 out of the 1029 individuals. Further query of 53 probands in an independent cohort of cardiac channelopathy, using exome sequencing, revealed the presence of 3 out of the 13 P/LP variants. The identification of p.G179Sfs*62, p.R823W and c.420 + 2 T > C variants in KCNQ1, KCNH2 and CASQ2 genes, respectively, validate the significance of the P/LP variants in the context of clinical applicability as well as for large-scale population analysis.A compendium of ACMG/AMP classified cardiac channelopathy variants in 1029 self-declared healthy Indian population was created. A conservative genotypic prevalence was estimated to be 0.9-1.8% which poses a huge public health burden for a country with large population size like India. In the majority of cases, these disorders are manageable and the risk of sudden cardiac death can be alleviated by appropriate lifestyle modifications as well as treatment regimens/clinical interventions. Clinical utility of the obtained variants was demonstrated using a cardiac channelopathy patient cohort. Our study emphasises the need for large-scale population screening to identify at-risk individuals and take preventive measures. However, we suggest cautious clinical interpretation to be exercised by taking other cardiac channelopathy risk factors into account.

    View details for DOI 10.1186/s40246-022-00402-2

    View details for PubMedID 35932045

    View details for PubMedCentralID PMC9354277

  • Landscape of Variability in Chemosensory Genes Associated With Dietary Preferences in Indian Population: Analysis of 1029 Indian Genomes. Frontiers in genetics Prakrithi, P., Jha, P., Jaiswal, J., Sharma, D., Bhoyar, R. C., Jain, A., Imran, M., Senthilvel, V., Divakar, M. K., Mishra, A., Scaria, V., Sivasubbu, S., Mukerji, M. 2022; 13: 878134

    Abstract

    Perception and preferences for food and beverages determine dietary behaviour and health outcomes. Inherent differences in chemosensory genes, ethnicity, geo-climatic conditions, and sociocultural practices are other determinants. We aimed to study the variation landscape of chemosensory genes involved in perception of taste, texture, odour, temperature and burning sensations through analysis of 1,029 genomes of the IndiGen project and diverse continental populations. SNPs from 80 chemosensory genes were studied in whole genomes of 1,029 IndiGen samples and 2054 from the 1000 Genomes project. Population genetics approaches were used to infer ancestry of IndiGen individuals, gene divergence and extent of differentiation among studied populations. 137,760 SNPs including common and rare variants were identified in IndiGenomes with 62,950 novel (46%) and 48% shared with the 1,000 Genomes. Genes associated with olfaction harbored most SNPs followed by those associated with differences in perception of salt and pungent tastes. Across species, receptors for bitter taste were the most diverse compared to others. Three predominant ancestry groups within IndiGen were identified based on population structure analysis. We also identified 1,184 variants that exhibit differences in frequency of derived alleles and high population differentiation (FST ≥0.3) in Indian populations compared to European, East Asian and African populations. Examples include ADCY10, TRPV1, RGS6, OR7D4, ITPR3, OPRM1, TCF7L2, and RUNX1. This is a first of its kind of study on baseline variations in genes that could govern cuisine designs, dietary preferences and health outcomes. This would be of enormous utility in dietary recommendations for precision nutrition both at population and individual level.

    View details for DOI 10.3389/fgene.2022.878134

    View details for PubMedID 35903357

    View details for PubMedCentralID PMC9315315

  • Comprehensive Assessment of Indian Variations in the Druggable Kinome Landscape Highlights Distinct Insights at the Sequence, Structure and Pharmacogenomic Stratum. Frontiers in pharmacology Panda, G., Mishra, N., Sharma, D., Kutum, R., Bhoyar, R. C., Jain, A., Imran, M., Senthilvel, V., Divakar, M. K., Mishra, A., Garg, P., Banerjee, P., Sivasubbu, S., Scaria, V., Ray, A. 2022; 13: 858345

    Abstract

    India confines more than 17% of the world's population and has a diverse genetic makeup with several clinically relevant rare mutations belonging to many sub-group which are undervalued in global sequencing datasets like the 1000 Genome data (1KG) containing limited samples for Indian ethnicity. Such databases are critical for the pharmaceutical and drug development industry where diversity plays a crucial role in identifying genetic disposition towards adverse drug reactions. A qualitative and comparative sequence and structural study utilizing variant information present in the recently published, largest curated Indian genome database (IndiGen) and the 1000 Genome data was performed for variants belonging to the kinase coding genes, the second most targeted group of drug targets. The sequence-level analysis identified similarities and differences among different populations based on the nsSNVs and amino acid exchange frequencies whereas a comparative structural analysis of IndiGen variants was performed with pathogenic variants reported in UniProtKB Humsavar data. The influence of these variations on structural features of the protein, such as structural stability, solvent accessibility, hydrophobicity, and the hydrogen-bond network was investigated. In-silico screening of the known drugs to these Indian variation-containing proteins reveals critical differences imparted in the strength of binding due to the variations present in the Indian population. In conclusion, this study constitutes a comprehensive investigation into the understanding of common variations present in the second largest population in the world and investigating its implications in the sequence, structural and pharmacogenomic landscape. The preliminary investigation reported in this paper, supporting the screening and detection of ADRs specific to the Indian population could aid in the development of techniques for pre-clinical and post-market screening of drug-related adverse events in the Indian population.

    View details for DOI 10.3389/fphar.2022.858345

    View details for PubMedID 35865963

    View details for PubMedCentralID PMC9294532

  • Pharmacogenomic landscape of Indian population using whole genomes. Clinical and translational science Sahana, S., Bhoyar, R. C., Sivadas, A., Jain, A., Imran, M., Rophina, M., Senthivel, V., Kumar Diwakar, M., Sharma, D., Mishra, A., Sivasubbu, S., Scaria, V. 2022

    Abstract

    Ethnic differences in pharmacogenomic (PGx) variants have been well documented in literature and could significantly impact variability in response and adverse events to therapeutics. India is a large country with diverse ethnic populations of distinct genetic architecture. India's national genome sequencing initiative (IndiGen) provides a unique opportunity to explore the landscape of PGx variants using population-scale whole genome sequences. We have analyzed the IndiGen variation dataset (N = 1029 genomes) along with global population scale databases to map the most prevalent clinically actionable and potentially deleterious PGx variants among Indians. Differential frequencies for the known and novel variants were studied and interaction of the disrupted PGx genes affecting drug responses were analyzed by performing a pathway analysis. We have highlighted significant differences in the allele frequencies of clinically actionable PGx variants in Indians when compared to the global populations. We identified 134 mostly common (allele frequency [AF] > 0.1) potentially deleterious PGx variants that could alter or inhibit the function of 102 pharmacogenes in Indians. We also estimate that on, an average, each Indian individual carried eight PGx variants (single nucleotide variants) that have a direct impact on the choice of treatment or drug dosing. We have also highlighted clinically actionable PGx variants and genes for which preemptive genotyping is most recommended for the Indian population. The study has put forward the most comprehensive PGx landscape of the Indian population from whole genomes that could enable optimized drug selection and genotype-guided prescriptions for improved therapeutic outcomes and minimizing adverse events.

    View details for DOI 10.1111/cts.13153

    View details for PubMedID 35338580

  • An Alu insertion map of the Indian population: identification and analysis in 1021 genomes of the IndiGen project. NAR genomics and bioinformatics Prakrithi, P., Singhal, K., Sharma, D., Jain, A., Bhoyar, R. C., Imran, M., Senthilvel, V., Divakar, M. K., Mishra, A., Scaria, V., Sivasubbu, S., Mukerji, M. 2022; 4 (1): lqac009

    Abstract

    Actively retrotransposing primate-specific Alu repeats display insertion-deletion (InDel) polymorphism through their insertion at new loci. In the global datasets, Indian populations remain under-represented and so do their Alu InDels. Here, we report the genomic landscape of Alu InDels from the recently released 1021 Indian Genomes (IndiGen) (available at https://clingen.igib.res.in/indigen). We identified 9239 polymorphic Alu insertions that include private (3831), rare (3974) and common (1434) insertions with an average of 770 insertions per individual. We achieved an 89% PCR validation of the predicted genotypes in 94 samples tested. About 60% of identified InDels are unique to IndiGen when compared to other global datasets; 23% of sites were shared with both SGDP and HGSVC; among these, 58% (1289 sites) were common polymorphisms in IndiGen. The insertions not only show a bias for genic regions, with a preference for introns but also for the associated genes showing enrichment for processes like cell morphogenesis and neurogenesis (P-value < 0.05). Approximately, 60% of InDels mapped to genes present in the OMIM database. Finally, we show that 558 InDels can serve as ancestry informative markers to segregate global populations. This study provides a valuable resource for baseline Alu InDels that would be useful in population genomics.

    View details for DOI 10.1093/nargab/lqac009

    View details for PubMedID 35178516

    View details for PubMedCentralID PMC8846365

  • ZEB2 Shapes the Epigenetic Landscape of Atherosclerosis Circulation Cheng, P., Wirka, R. C., Clarke, L., Zhao, Q., Kundu, R., Nguyen, T., Nair, S., Sharma, D., Kim, H., Shi, H., Assimes, T., Kim, J., Kundaje, A., Quertermous, T. 2022; 145 (6): 469–485

    Abstract

    Background: Smooth muscle cells (SMC) transition into a number of different phenotypes during atherosclerosis, including those that resemble fibroblasts and chondrocytes, and make up the majority of cells in the atherosclerotic plaque. To better understand the epigenetic and transcriptional mechanisms that mediate these cell state changes, and how they relate to risk for coronary artery disease (CAD), we have investigated the causality and function of transcription factors (TFs) at genome wide associated loci. Methods: We employed CRISPR-Cas 9 genome and epigenome editing to identify the causal gene and cell(s) for a complex CAD GWAS signal at 2q22.3. Subsequently, single-cell epigenetic and transcriptomic profiling in murine models and human coronary artery smooth muscle cells were employed to understand the cellular and molecular mechanism by which this CAD risk gene exerts its function. Results: CRISPR-Cas 9 genome and epigenome editing showed that the complex CAD genetic signals within a genomic region at 2q22.3 lie within smooth muscle long-distance enhancers for ZEB2, a TF extensively studied in the context of epithelial mesenchymal transition (EMT) in development and cancer. ZEB2 regulates SMC phenotypic transition through chromatin remodeling that obviates accessibility and disrupts both Notch and TGFβ signaling, thus altering the epigenetic trajectory of SMC transitions. SMC specific loss of ZEB2 resulted in an inability of transitioning SMCs to turn off contractile programing and take on a fibroblast-like phenotype, but accelerated the formation of chondromyocytes, mirroring features of high-risk atherosclerotic plaques in human coronary arteries. Conclusions: These studies identify ZEB2 as a new CAD GWAS gene that affects features of plaque vulnerability through direct effects on the epigenome, providing a new thereapeutic approach to target vascular disease.

    View details for DOI 10.1161/CIRCULATIONAHA.121.057789

  • ZEB2 Shapes the Epigenetic Landscape of Atherosclerosis. Circulation Cheng, P., Wirka, R. C., Clarke, L. S., Zhao, Q., Kundu, R., Nguyen, T., Nair, S., Sharma, D., Kim, H. J., Shi, H., Assimes, T., Kim, J. B., Kundaje, A., Quertermous, T. 2022

    Abstract

    Background: Smooth muscle cells (SMC) transition into a number of different phenotypes during atherosclerosis, including those that resemble fibroblasts and chondrocytes, and make up the majority of cells in the atherosclerotic plaque. To better understand the epigenetic and transcriptional mechanisms that mediate these cell state changes, and how they relate to risk for coronary artery disease (CAD), we have investigated the causality and function of transcription factors (TFs) at genome wide associated loci. Methods: We employed CRISPR-Cas 9 genome and epigenome editing to identify the causal gene and cell(s) for a complex CAD GWAS signal at 2q22.3. Subsequently, single-cell epigenetic and transcriptomic profiling in murine models and human coronary artery smooth muscle cells were employed to understand the cellular and molecular mechanism by which this CAD risk gene exerts its function. Results: CRISPR-Cas 9 genome and epigenome editing showed that the complex CAD genetic signals within a genomic region at 2q22.3 lie within smooth muscle long-distance enhancers for ZEB2, a TF extensively studied in the context of epithelial mesenchymal transition (EMT) in development and cancer. ZEB2 regulates SMC phenotypic transition through chromatin remodeling that obviates accessibility and disrupts both Notch and TGFβ signaling, thus altering the epigenetic trajectory of SMC transitions. SMC specific loss of ZEB2 resulted in an inability of transitioning SMCs to turn off contractile programing and take on a fibroblast-like phenotype, but accelerated the formation of chondromyocytes, mirroring features of high-risk atherosclerotic plaques in human coronary arteries. Conclusions: These studies identify ZEB2 as a new CAD GWAS gene that affects features of plaque vulnerability through direct effects on the epigenome, providing a new thereapeutic approach to target vascular disease.

    View details for DOI 10.1161/CIRCULATIONAHA.121.057789

    View details for PubMedID 34990206

  • Genetic epidemiology of autoinflammatory disease variants in Indian population from 1029 whole genomes. Journal, genetic engineering & biotechnology Jain, A., Bhoyar, R. C., Pandhare, K., Mishra, A., Sharma, D., Imran, M., Senthivel, V., Divakar, M. K., Rophina, M., Jolly, B., Batra, A., Sharma, S., Siwach, S., Jadhao, A. G., Palande, N. V., Jha, G. N., Ashrafi, N., Mishra, P. K., A K, V., Jain, S., Dash, D., Kumar, N. S., Vanlallawma, A., Sarma, R. J., Chhakchhuak, L., Kalyanaraman, S., Mahadevan, R., Kandasamy, S., B M, P., Rajagopal, R. E., Ramya J, E., Devi P, N., Bajaj, A., Gupta, V., Mathew, S., Goswami, S., Mangla, M., Prakash, S., Joshi, K., S, S., Gajjar, D., Soraisham, R., Yadav, R., Devi, Y. S., Gupta, A., Mukerji, M., Ramalingam, S., B K, B., Scaria, V., Sivasubbu, S. 2021; 19 (1): 183

    Abstract

    Autoinflammatory disorders are the group of inherited inflammatory disorders caused due to the genetic defect in the genes that regulates innate immune systems. These have been clinically characterized based on the duration and occurrence of unprovoked fever, skin rash, and patient's ancestry. There are several autoinflammatory disorders that are found to be prevalent in a specific population and whose disease genetic epidemiology within the population has been well understood. However, India has a limited number of genetic studies reported for autoinflammatory disorders till date. The whole genome sequencing and analysis of 1029 Indian individuals performed under the IndiGen project persuaded us to perform the genetic epidemiology of the autoinflammatory disorders in India.We have systematically annotated the genetic variants of 56 genes implicated in autoinflammatory disorder. These genetic variants were reclassified into five categories (i.e., pathogenic, likely pathogenic, benign, likely benign, and variant of uncertain significance (VUS)) according to the American College of Medical Genetics and Association of Molecular pathology (ACMG-AMP) guidelines. Our analysis revealed 20 pathogenic and likely pathogenic variants with significant differences in the allele frequency compared with the global population. We also found six causal founder variants in the IndiGen dataset belonging to different ancestry. We have performed haplotype prediction analysis for founder mutations haplotype that reveals the admixture of the South Asian population with other populations. The cumulative carrier frequency of the autoinflammatory disorder in India was found to be 3.5% which is much higher than reported.With such frequency in the Indian population, there is a great need for awareness among clinicians as well as the general public regarding the autoinflammatory disorder. To the best of our knowledge, this is the first and most comprehensive population scale genetic epidemiological study being reported from India.

    View details for DOI 10.1186/s43141-021-00268-2

    View details for PubMedID 34905135

  • Asymptomatic reactivation of SARS-CoV-2 in a child with neuroblastoma characterised by whole genome sequencing. IDCases Yadav, S. P., Thakkar, D., Bhoyar, R. C., Jain, A., Wadhwa, T., Imran, M., Jolly, B., Divakar, M. K., Kapoor, R., Rastogi, N., Sharma, D., Sehgal, P., Ranjan, G., Sivasubbu, S., Sarma, S., Scaria, V. 2021; 23: e01018

    View details for DOI 10.1016/j.idcr.2020.e01018

    View details for PubMedID 33288996

    View details for PubMedCentralID PMC7711173

  • Functional long non-coding and circular RNAs in zebrafish. Briefings in functional genomics Ranjan, G., Sehgal, P., Sharma, D., Scaria, V., Sivasubbu, S. 2021

    Abstract

    The utility of model organisms to understand the function of a novel transcript/genes has allowed us to delineate their molecular mechanisms in maintaining cellular homeostasis. Organisms such as zebrafish have contributed a lot in the field of developmental and disease biology. Attributable to advancement and deep transcriptomics, many new transcript isoforms and non-coding RNAs such as long noncoding RNA (lncRNA) and circular RNAs (circRNAs) have been identified and cataloged in multiple databases and many more are yet to be identified. Various methods and tools have been utilized to identify lncRNAs/circRNAs in zebrafish using deep sequencing of transcriptomes as templates. Functional analysis of a few candidates such as tie1-AS, ECAL1 and CDR1as in zebrafish provides a prospective outline to approach other known or novel lncRNA/circRNA. New genetic alteration tools like TALENS and CRISPRs have helped in probing for the molecular function of lncRNA/circRNA in zebrafish. Further latest improvements in experimental and computational techniques offer the identification of lncRNA/circRNA counterparts in humans and zebrafish thereby allowing easy modeling and analysis of function at cellular level.

    View details for DOI 10.1093/bfgp/elab014

    View details for PubMedID 33755040

  • Initial Insights Into the Genetic Epidemiology of SARS-CoV-2 Isolates From Kerala Suggest Local Spread From Limited Introductions FRONTIERS IN GENETICS Radhakrishnan, C., Divakar, M., Jain, A., Viswanathan, P., Bhoyar, R. C., Jolly, B., Imran, M., Sharma, D., Rophina, M., Ranjan, G., Sehgal, P., Jose, B., Raman, R., Kesavan, T., George, K., Mathew, S., Poovullathil, J., Keeriyatt Govindan, S., Nair, P., Vadekkandiyil, S., Gladson, V., Mohan, M., Parambath, F., Mangla, M., Shamnath, A., Sivasubbu, S., Scaria, V., Indian CoV2 Genomics Genetic 2021; 12: 630542

    Abstract

    Coronavirus disease 2019 (COVID-19) rapidly spread from a city in China to almost every country in the world, affecting millions of individuals. The rapid increase in the COVID-19 cases in the state of Kerala in India has necessitated the understanding of SARS-CoV-2 genetic epidemiology. We sequenced 200 samples from patients in Kerala using COVIDSeq protocol amplicon-based sequencing. The analysis identified 166 high-quality single-nucleotide variants encompassing four novel variants and 89 new variants in the Indian isolated SARS-CoV-2. Phylogenetic and haplotype analysis revealed that the virus was dominated by three distinct introductions followed by local spread suggesting recent outbreaks and that it belongs to the A2a clade. Further analysis of the functional variants revealed that two variants in the S gene associated with increased infectivity and five variants mapped in primer binding sites affect the efficacy of RT-PCR. To the best of our knowledge, this is the first and most comprehensive report of SARS-CoV-2 genetic epidemiology from Kerala.

    View details for DOI 10.3389/fgene.2021.630542

    View details for Web of Science ID 000635122300001

    View details for PubMedID 33815467

    View details for PubMedCentralID PMC8010186

  • Founder variants and population genomes-Toward precision medicine. Advances in genetics Jain, A., Sharma, D., Bajaj, A., Gupta, V., Scaria, V. 2021; 107: 121-152

    Abstract

    Human migration and community specific cultural practices have contributed to founder events and enrichment of the variants associated with genetic diseases. While many founder events in isolated populations have remained uncharacterized, the application of genomics in clinical settings as well as for population scale studies in the recent years have provided an unprecedented push towards identification of founder variants associated with human health and disease. The discovery and characterization of founder variants could have far reaching implications not only in understanding the history or genealogy of the disease, but also in implementing evidence based policies and genetic testing frameworks. This further enables precise diagnosis and prevention in an attempt towards precision medicine. This review provides an overview of founder variants along with methods and resources cataloging them. We have also discussed the public health implications and examples of prevalent disease associated founder variants in specific populations.

    View details for DOI 10.1016/bs.adgen.2020.11.004

    View details for PubMedID 33641745

  • High throughput detection and genetic epidemiology of SARS-CoV-2 using COVIDSeq next-generation sequencing. PloS one Bhoyar, R. C., Jain, A., Sehgal, P., Divakar, M. K., Sharma, D., Imran, M., Jolly, B., Ranjan, G., Rophina, M., Sharma, S., Siwach, S., Pandhare, K., Sahoo, S., Sahoo, M., Nayak, A., Mohanty, J. N., Das, J., Bhandari, S., Mathur, S. K., Kumar, A., Sahlot, R., Rojarani, P., Lakshmi, J. V., Surekha, A., Sekhar, P. C., Mahajan, S., Masih, S., Singh, P., Kumar, V., Jose, B., Mahajan, V., Gupta, V., Gupta, R., Arumugam, P., Singh, A., Nandy, A., P V, R., Jha, R. M., Kumari, A., Gandotra, S., Rao, V., Faruq, M., Kumar, S., Reshma G, B., Varma G, N., Roy, S. S., Sengupta, A., Chattopadhyay, S., Singhal, K., Pradhan, S., Jha, D., Naushin, S., Wadhwa, S., Tyagi, N., Poojary, M., Scaria, V., Sivasubbu, S. 2021; 16 (2): e0247115

    Abstract

    The rapid emergence of coronavirus disease 2019 (COVID-19) as a global pandemic affecting millions of individuals globally has necessitated sensitive and high-throughput approaches for the diagnosis, surveillance, and determining the genetic epidemiology of SARS-CoV-2. In the present study, we used the COVIDSeq protocol, which involves multiplex-PCR, barcoding, and sequencing of samples for high-throughput detection and deciphering the genetic epidemiology of SARS-CoV-2. We used the approach on 752 clinical samples in duplicates, amounting to a total of 1536 samples which could be sequenced on a single S4 sequencing flow cell on NovaSeq 6000. Our analysis suggests a high concordance between technical duplicates and a high concordance of detection of SARS-CoV-2 between the COVIDSeq as well as RT-PCR approaches. An in-depth analysis revealed a total of six samples in which COVIDSeq detected SARS-CoV-2 in high confidence which were negative in RT-PCR. Additionally, the assay could detect SARS-CoV-2 in 21 samples and 16 samples which were classified inconclusive and pan-sarbeco positive respectively suggesting that COVIDSeq could be used as a confirmatory test. The sequencing approach also enabled insights into the evolution and genetic epidemiology of the SARS-CoV-2 samples. The samples were classified into a total of 3 clades. This study reports two lineages B.1.112 and B.1.99 for the first time in India. This study also revealed 1,143 unique single nucleotide variants and added a total of 73 novel variants identified for the first time. To the best of our knowledge, this is the first report of the COVIDSeq approach for detection and genetic epidemiology of SARS-CoV-2. Our analysis suggests that COVIDSeq could be a potential high sensitivity assay for the detection of SARS-CoV-2, with an additional advantage of enabling the genetic epidemiology of SARS-CoV-2.

    View details for DOI 10.1371/journal.pone.0247115

    View details for PubMedID 33596239

    View details for PubMedCentralID PMC7888613

  • IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes. Nucleic acids research Jain, A., Bhoyar, R. C., Pandhare, K., Mishra, A., Sharma, D., Imran, M., Senthivel, V., Divakar, M. K., Rophina, M., Jolly, B., Batra, A., Sharma, S., Siwach, S., Jadhao, A. G., Palande, N. V., Jha, G. N., Ashrafi, N., Mishra, P. K., A K, V., Jain, S., Dash, D., Kumar, N. S., Vanlallawma, A., Sarma, R. J., Chhakchhuak, L., Kalyanaraman, S., Mahadevan, R., Kandasamy, S., B M, P., Rajagopal, R. E., J, E. R., P, N. D., Bajaj, A., Gupta, V., Mathew, S., Goswami, S., Mangla, M., Prakash, S., Joshi, K., S, S., Gajjar, D., Soraisham, R., Yadav, R., Devi, Y. S., Gupta, A., Mukerji, M., Ramalingam, S., B K, B., Scaria, V., Sivasubbu, S. 2021; 49 (D1): D1225-D1232

    Abstract

    With the advent of next-generation sequencing, large-scale initiatives for mining whole genomes and exomes have been employed to better understand global or population-level genetic architecture. India encompasses more than 17% of the world population with extensive genetic diversity, but is under-represented in the global sequencing datasets. This gave us the impetus to perform and analyze the whole genome sequencing of 1029 healthy Indian individuals under the pilot phase of the 'IndiGen' program. We generated a compendium of 55,898,122 single allelic genetic variants from geographically distinct Indian genomes and calculated the allele frequency, allele count, allele number, along with the number of heterozygous or homozygous individuals. In the present study, these variants were systematically annotated using publicly available population databases and can be accessed through a browsable online database named as 'IndiGenomes' http://clingen.igib.res.in/indigen/. The IndiGenomes database will help clinicians and researchers in exploring the genetic component underlying medical conditions. Till date, this is the most comprehensive genetic variant resource for the Indian population and is made freely available for academic utility. The resource has also been accessed extensively by the worldwide community since it's launch.

    View details for DOI 10.1093/nar/gkaa923

    View details for PubMedID 33095885

    View details for PubMedCentralID PMC7778947

  • A genome-wide circular RNA transcriptome in rat. Biology methods & protocols Sharma, D., Sehgal, P., Sivasubbu, S., Scaria, V. 2021; 6 (1): bpab016

    Abstract

    Circular RNAs (circRNAs) are a novel class of noncoding RNAs that back-splice from 5' donor site and 3' acceptor sites to form a circular structure. A number of circRNAs have been discovered in model organisms including human, mouse, Drosophila, among other organisms. There are a few candidate-based studies on circRNAs in rat, a well-studied model organism as well. A number of pipelines have been published to identify the back splice junctions for the discovery of circRNAs but studies comparing these tools have suggested that a combination of tools would be a better approach to identify high-confidence circRNAs. The availability of a recent dataset of transcriptomes encompassing 11 tissues, 4 developmental stages, and 2 genders motivated us to explore the landscape of circRNAs in the organism in this context. In order to understand the difference among different pipelines, we employed five different combinations of tools to identify circular RNAs from the dataset. We compared the results of the different combination of tools/pipelines with respect to alignment, total number of circRNAs identified and read-coverage. In addition, we identified tissue-specific, development-stage specific and gender-specific circRNAs and further independently validated 16 circRNA junctions out of 24 selected candidates in 5 tissue samples and estimated the quantitative expression of five circRNA candidates using real-time polymerase chain reaction and our analysis suggests three candidates as tissue-enriched. This study is one of the most comprehensive studies which provides a map of circRNAs transcriptome as well as to understand the difference among different computational pipelines in rat.

    View details for DOI 10.1093/biomethods/bpab016

    View details for PubMedID 34527809

    View details for PubMedCentralID PMC8435660

  • Asymptomatic reinfection in two healthcare workers from India with genetically distinct SARS-CoV-2. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America Gupta, V., Bhoyar, R. C., Jain, A., Srivastava, S., Upadhayay, R., Imran, M., Jolly, B., Divakar, M. K., Sharma, D., Sehgal, P., Ranjan, G., Gupta, R., Scaria, V., Sivasubbu, S. 2020

    View details for DOI 10.1093/cid/ciaa1451

    View details for PubMedID 32964927

    View details for PubMedCentralID PMC7543380

  • Saliva microbiome in primary Sjögren's syndrome reveals distinct set of disease-associated microbes. Oral diseases Sharma, D., Sandhya, P., Vellarikkal, S. K., Surin, A. K., Jayarajan, R., Verma, A., Kumar, A., Ravi, R., Danda, D., Sivasubbu, S., Scaria, V. 2020; 26 (2): 295-301

    Abstract

    This study systematically aims to evaluate the salivary microbiome in patients with primary Sjögren's syndrome (pSS) using 16S rRNA sequencing approach.DNA isolation and 16S rRNA sequencing was performed on saliva of 37 pSS and 35 control (CC) samples on HiSeq 2500 platform. 16S rRNA sequence analysis was performed independently using two popular computational pipelines, QIIME and less operational taxonomic units scripts (LoTuS).There were no significant changes in the alpha diversity between saliva of patients and controls. However, four genera including Bifidobacterium, Lactobacillus, Dialister and Leptotrichia were found to be differential between the two sets, and common between both QIIME and LoTuS analysis pipelines (Fold change of 2 and p < .05). Bifidobacterium, Dialister and Lactobacillus were found to be enriched, while Leptotrichia was significantly depleted in pSS compared to the controls. Exploration of microbial diversity measures (Chao1, observed species and Shannon index) revealed a significant increase in the diversity in patients with renal tubular acidosis. An opposite trend was noted, with depletion of diversity in patients with steroids.Our analysis suggests that while no significant changes in the diversity of the salivary microbiome could be observed in Sjögren's syndrome compared to the controls, a set of four genera were significantly and consistently differential in the saliva of patients with pSS. Additionally, a difference in alpha diversity in patients with renal tubular acidosis and those on steroids was observed.

    View details for DOI 10.1111/odi.13191

    View details for PubMedID 31514257

  • Circad: a comprehensive manually curated resource of circular RNA associated with diseases. Database : the journal of biological databases and curation Rophina, M., Sharma, D., Poojary, M., Scaria, V. 2020; 2020

    Abstract

    Circular RNAs (circRNAs) are unique transcript isoforms characterized by back splicing of exon ends to form a covalently closed loop or circular conformation. These transcript isoforms are now known to be expressed in a variety of organisms across the kingdoms of life. Recent studies have shown the role of circRNAs in a number of diseases and increasing evidence points to their potential application as biomarkers in these diseases. We have created a comprehensive manually curated database of circular RNAs associated with diseases. This database is available at URL http://clingen.igib.res.in/circad/. The Database lists more than 1300 circRNAs associated with 150 diseases and mapping to 113 International Statistical Classification of Diseases (ICD) codes with evidence of association linked to published literature. The database is unique in many ways. Firstly, it provides ready-to-use primers to work with, in order to use circRNAs as biomarkers or to perform functional studies. It additionally lists the assay and PCR primer details including experimentally validated ones as a ready reference to researchers along with fold change and statistical significance. It also provides standard disease nomenclature as per the ICD codes. To the best of our knowledge, circad is the most comprehensive and updated database of disease associated circular RNAs.Availability: http://clingen.igib.res.in/circad/.

    View details for DOI 10.1093/database/baaa019

    View details for PubMedID 32219412

    View details for PubMedCentralID PMC7100626

  • Genomics of rare genetic diseases-experiences from India HUMAN GENOMICS Sivasubbu, S., Scaria, V., GUaRDIAN Consortium 2019; 13 (1): 52

    Abstract

    Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India.Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socio-economic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nation-wide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma.In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nation-wide framework to cater to the rare disease community of India.

    View details for DOI 10.1186/s40246-019-0215-5

    View details for Web of Science ID 000514921600001

    View details for PubMedID 31554517

    View details for PubMedCentralID PMC6760067

  • Organellar transcriptome sequencing reveals mitochondrial localization of nuclear encoded transcripts. Mitochondrion Sabharwal, A., Sharma, D., Vellarikkal, S. K., Jayarajan, R., Verma, A., Senthivel, V., Scaria, V., Sivasubbu, S. 2019; 46: 59-68

    Abstract

    Mitochondria are organelles involved in a variety of biological functions in the cell, apart from their principal role in generation of ATP, the cellular currency of energy. The mitochondria, in spite of being compact organelles, are capable of performing complex biological functions largely because of the ability to exchange proteins, RNA, chemical metabolites and other biomolecules between cellular compartments. A close network of biomolecular interactions are known to modulate the crosstalk between the mitochondria and the nuclear genome. Apart from the small repertoire of genes encoded by the mitochondrial genome, it is now known that the functionality of the organelle is highly reliant on a number of proteins encoded by the nuclear genome, which localize to the mitochondria. With exceptions to a few anecdotal examples, the transcripts that have the potential to localize to the mitochondria have been poorly studied. We used a deep sequencing approach to identify transcripts encoded by the nuclear genome which localize to the mitoplast in a zebrafish model. We prioritized 292 candidate transcripts of nuclear origin that are potentially localized to the mitochondrial matrix. We experimentally demonstrated that the transcript encoding the nuclear encoded ribosomal protein 11 (Rpl11) localizes to the mitochondria. This study represents a comprehensive analysis of the mitochondrial localization of nuclear encoded transcripts. Our analysis has provided insights into a new layer of biomolecular pathways modulating mitochondrial-nuclear cross-talk. This provides a starting point towards understanding the role of nuclear encoded transcripts that localize to mitochondria and their influence on mitochondrial function.

    View details for DOI 10.1016/j.mito.2018.02.007

    View details for PubMedID 29486245

  • A genome-wide map of circular RNAs in adult zebrafish. Scientific reports Sharma, D., Sehgal, P., Mathew, S., Vellarikkal, S. K., Singh, A. R., Kapoor, S., Jayarajan, R., Scaria, V., Sivasubbu, S. 2019; 9 (1): 3432

    Abstract

    Circular RNAs (circRNAs) are transcript isoforms generated by back-splicing of exons and circularisation of the transcript. Recent genome-wide maps created for circular RNAs in humans and other model organisms have motivated us to explore the repertoire of circular RNAs in zebrafish, a popular model organism. We generated RNA-seq data for five major zebrafish tissues - Blood, Brain, Heart, Gills and Muscle. The repertoire RNA sequence reads left over after reference mapping to linear transcripts were used to identify unique back-spliced exons utilizing a split-mapping algorithm. Our analysis revealed 3,428 novel circRNAs in zebrafish. Further in-depth analysis suggested that majority of the circRNAs were derived from previously well-annotated protein-coding and long noncoding RNA gene loci. In addition, many of the circular RNAs showed extensive tissue specificity. We independently validated a subset of circRNAs using polymerase chain reaction (PCR) and divergent set of primers. Expression analysis using quantitative real time PCR recapitulate selected tissue specificity in the candidates studied. This study provides a comprehensive genome-wide map of circular RNAs in zebrafish tissues.

    View details for DOI 10.1038/s41598-019-39977-7

    View details for PubMedID 30837568

    View details for PubMedCentralID PMC6401160

  • Methods for Annotation and Validation of Circular RNAs from RNAseq Data. Methods in molecular biology (Clifton, N.J.) Sharma, D., Sehgal, P., Hariprakash, J., Sivasubbu, S., Scaria, V. 2019; 1912: 55-76

    Abstract

    Circular RNAs are an emerging class of transcript isoforms created by unique back splicing of exons to form a closed covalent circular structure. While initially considered as product of aberrant splicing, recent evidence suggests unique functions and conservation across evolution. While circular RNAs could be largely attributed to have little or no potential to encode for proteins, recent evidence points to at least a small subset of circular RNAs which encode for peptides. Circular RNAs are also increasingly shown to be biomarkers for a number of diseases including neurological disorders and cancer. The advent of deep sequencing has enabled large-scale identification of circular RNAs in human and other genomes. A number of computational approaches have come up in recent years to query circular RNAs on a genome-wide scale from RNA-seq data. In this chapter, we describe the application and methodology of identifying circular RNAs using three popular computational tools: FindCirc, Segemehl, and CIRI along with approaches for experimental validation of the unique splice junctions.

    View details for DOI 10.1007/978-1-4939-8982-9_3

    View details for PubMedID 30635890

  • Autologous NeoHep Derived from Chronic Hepatitis B Virus Patients' Blood Monocytes by Upregulation of c-MET Signaling. Stem cells translational medicine Bhattacharjee, J., Das, B., Sharma, D., Sahay, P., Jain, K., Mishra, A., Iyer, S., Nagpal, P., Scaria, V., Nagarajan, P., Khanduri, P., Mukhopadhyay, A., Upadhyay, P. 2017; 6 (1): 174-186

    Abstract

    In view of the escalating need for autologous cell-based therapy for treatment of liver diseases, a novel candidate has been explored in the present study. The monocytes isolated from hepatitis B surface antigen (HBsAg) nucleic acid test (NAT)-positive (HNP) blood were differentiated to hepatocyte-like cells (NeoHep) in vitro by a two-step culture procedure. The excess neutrophils present in HNP blood were removed before setting up the culture. In the first step of culture, apoptotic cells were depleted and genes involved in hypoxia were induced, which was followed by the upregulation of genes involved in the c-MET signaling pathway in the second step. The NeoHep were void of hepatitis B virus and showed expression of albumin, connexin 32, hepatocyte nuclear factor 4-α, and functions such as albumin secretion and cytochrome P450 enzyme-mediated detoxification of xenobiotics. The engraftment of NeoHep derived from HBsAg-NAT-positive blood monocytes in partially hepatectomized NOD.CB17-Prkdcscid /J mice liver and the subsequent secretion of human albumin and clotting factor VII activity in serum make NeoHep a promising candidate for cell-based therapy. Stem Cells Translational Medicine 2017;6:174-186.

    View details for DOI 10.5966/sctm.2015-0308

    View details for PubMedID 28170202

    View details for PubMedCentralID PMC5442753

  • Does the buck stop with the bugs?: an overview of microbial dysbiosis in rheumatoid arthritis. International journal of rheumatic diseases Sandhya, P., Danda, D., Sharma, D., Scaria, V. 2016; 19 (1): 8-20

    Abstract

    The human body is an environmental niche which is home to diverse co-habiting microbes collectively referred as the human microbiome. Recent years have seen the in-depth characterization of the human microbiome and associations with diseases. Linking of the composition or number of the human microbiota with diseases and traits date back to the original work of Elie Metchnikoff. Recent advances in genomic technologies have opened up finer details and dynamics of this new science with higher precision. Microbe-rheumatoid arthritis connection, largely related to the gut and oral microbiomes, has showed up as a result - apart from several other earlier, well-studied candidate autoimmune diseases. Although evidence favouring roles of specific microbial species, including Porphyromonas, Prevotella and Leptotricha, has become clearer, mechanistic insights still continue to be enigmatic. Manipulating the microbes by traditional dietary modifications, probiotics, and antibiotics and by currently employed disease-modifying agents seems to modulate the disease process and its progression. In the present review, we appraise the existing information as well as the gaps in knowledge in this challenging field. We also discuss the future directions for potential clinical applications, including prevention and management of rheumatoid arthritis using microbial modifications.

    View details for DOI 10.1111/1756-185X.12728

    View details for PubMedID 26385261