I am currently a Postdoctoral Fellow with Dr. Thomas Quertermous at Stanford University. I have joined the lab with more than 7 years of research experience in the field of computational biology wherein I have worked with multi-omics data for multiple diseases to get a deeper understanding of the disease identification and progression.
My background in engineering and bioinformatics provide an excellent background for the studies proposed in this application, which proposes to investigate the genetics and genomics of smooth muscle cell biology in the context of vascular disease. I first pursued a Bachelor's in Biotechnology program at one of the premier institutes in India, Banasthali Vidyapeeth and received my degree in 2007. After qualifying with the IIT-JAM exam in 2010, I joined the Master’s in Science (Biotechnology) program at the prestigious Indian Institute of Technology Roorkee in a program of engineering and technology. After my Master's, I joined Dr. Vinod Scaria’s lab at CSIR-IGIB as a Project Fellow. During the tenure as Project fellow from 2012-2014, I had the opportunity to work with different transcriptomics data from model organisms including zebrafish, rat and human cell lines to understand the role of long non-coding RNAs and miRNAs. I also worked on clinical datasets of autoimmune disorders. With one and half years of research experience and a UGC fellowship awarded through the NET-JRF examination, I continued working with Dr. Vinod Scaria to pursue my PhD. My research interest for the degree focused on the identification and characterization of circular RNAs, and this work has now been published in multiple manuscripts listed below. Over the years at CSIR-IGIB, I have had the chance to work on interesting ideas with multiple collaborating groups. One of them was Dr. Sridhar Sivasubbu, with whom I worked to understand the transcript-level interactions between mitochondria and the nucleus, using zebrafish as a model organism.
In view of my interest in the translational aspects of biology, I obtained the opportunity to work as part of the GUaRDIAN Consortium with Dr. Vinod Scaria and Dr. Sridhar Sivasubbu at CSIR-IGIB. This pioneering project is the largest network of researchers and clinicians in India pursuing sequencing patient DNAs to identify rare SNVs and structural variants responsible for muscular dystrophy in these patients. In the interest of advancing genomics in clinical and healthcare settings, I was selected as Intel Fellow 2019 to work for the Intel-IGIB collaboration focussing on “Accelerating Clinical Analysis and Interpretation of Genomic Data through advanced tools/libraries”. Our project was selected among top 3 from 50 premier research institutes and I was awarded the Intel-India Fellowship for a year to pursue this project. I was also part of the core team of IndiGen (Genomes for Public Health in India). With the spread of COVID-19 around the world, our group contributed by sequencing and analysing COVID19 genomes to get a better understanding of the disease and I had the opportunity to be part of the core team to analyse the viral sequencing datasets and viral assembly.
I am extremely pleased to have joined the Quertermous lab at Stanford to the study of the molecular mechanisms of cardiovascular disease. Work that I am pursuing in this laboratory, and proposed in this application, are directly in line with my personal aspiration to start an independent career in the field of scientific research to work on projects with high translational value and of interest to the public health.

Honors & Awards

  • Postdoctoral Fellow, Stanford University (1-9-2020 to present)
  • Intel-India Fellow, CSIR-Institute of Genomics and Integrative Biology and Intel pvt ltd. (2019-2020)
  • Senior Research Fellow, CSIR-Institute of Genomics and Integrative Biology (2016-2019)
  • Junior Research Fellow, CSIR-Institute of Genomics and Integrative Biology (2014-2016)
  • Project Fellow, CSIR-Institute of Genomics and Integrative Biology (2012-2014)

Stanford Advisors

All Publications

  • Landscape of pharmacogenetic variants associated with non-insulin antidiabetic drugs in the Indian population. BMJ open diabetes research & care Sivadas, A., Sahana, S., Jolly, B., Bhoyar, R. C., Jain, A., Sharma, D., Imran, M., Senthivel, V., Divakar, M. K., Mishra, A., Mukhopadhyay, A., Gibson, G., Narayan, K. V., Sivasubbu, S., Scaria, V., Kurpad, A. V. 2024; 12 (2)


    Genetic variants contribute to differential responses to non-insulin antidiabetic drugs (NIADs), and consequently to variable plasma glucose control. Optimal control of plasma glucose is paramount to minimizing type 2 diabetes-related long-term complications. India's distinct genetic architecture and its exploding burden of type 2 diabetes warrants a population-specific survey of NIAD-associated pharmacogenetic (PGx) variants. The recent availability of large-scale whole genomes from the Indian population provides a unique opportunity to generate a population-specific map of NIAD-associated PGx variants.We mined 1029 Indian whole genomes for PGx variants, drug-drug interaction (DDI) and drug-drug-gene interactions (DDGI) associated with 44 NIADs. Population-wise allele frequencies were estimated and compared using Fisher's exact test.Overall, we found 76 known and 52 predicted deleterious common PGx variants associated with response to type 2 diabetes therapy among Indians. We report remarkable interethnic differences in the relative cumulative counts of decreased and increased response-associated alleles across NIAD classes. Indians and South Asians showed a significant excess of decreased metformin response-associated alleles compared with other global populations. Network analysis of shared PGx genes predicts high DDI risk during coadministration of NIADs with other metabolic disease drugs. We also predict an increased CYP2C19-mediated DDGI risk for CYP3A4/3A5-metabolized NIADs, saxagliptin, linagliptin and glyburide when coadministered with proton-pump inhibitors (PPIs).Indians and South Asians have a distinct PGx profile for antidiabetes drugs, marked by an excess of poor treatment response-associated alleles for various NIAD classes. This suggests the possibility of a population-specific reduced drug response in atleast some NIADs. In addition, our findings provide an actionable resource for accelerating future diabetes PGx studies in Indians and South Asians and reconsidering NIAD dosing guidelines to ensure maximum efficacy and safety in the population.

    View details for DOI 10.1136/bmjdrc-2023-003769

    View details for PubMedID 38471670

  • The genomic landscape ofCYP2D6variation in the Indian population. Pharmacogenomics Sivadas, A., Rathore, S., Sahana, S., Jolly, B., Bhoyar, R. C., Jain, A., Sharma, D., Imran, M., Senthilvel, V., Divakar, M. K., Mishra, A., Sivasubbu, S., Scaria, V. 2024


    Aim: The CYP2D6 gene is highly polymorphic, causing large interindividual variability in the metabolism of several clinically important drugs. Materials & methods: The authors investigated the diversity and distribution of CYP2D6 alleles in Indians using whole genome sequences (N=1518). Functional consequences were assessed using pathogenicity scores and molecular dynamics simulations. Results: The analysis revealed population-specific CYP2D6 alleles (*86, *7, *111, *112, *113, *99) and remarkable differences in variant and phenotype frequencies with global populations. The authors observed that one in three Indians could benefit from a dose alteration for psychiatric drugs with accurate CYP2D6 phenotyping. Molecular dynamics simulations revealed large conformational fluctuations, confirming the predicted reduced function of *86 and *113 alleles. Conclusion: The findings emphasize the utility of comprehensive CYP2D6 profiling for aiding precision public health.

    View details for DOI 10.2217/pgs-2023-0233

    View details for PubMedID 38426301

  • A functional genomic framework to elucidate novel causal non-alcoholic fatty liver disease genes. medRxiv : the preprint server for health sciences Saliba-Gustafsson, P., Justesen, J. M., Ranta, A., Sharma, D., Bielczyk-Maczynska, E., Li, J., Najmi, L. A., Apodaka, M., Aspichueta, P., Björck, H. M., Eriksson, P., Franco-Cereceda, A., Gloudemans, M., Mujica, E., den Hoed, M., Assimes, T. L., Quertermous, T., Carcamo-Orive, I., Park, C. Y., Knowles, J. W. 2024


    Non-alcoholic fatty liver disease (NAFLD) is the most prevalent chronic liver pathology in western countries, with serious public health consequences. Efforts to identify causal genes for NAFLD have been hampered by the relative paucity of human data from gold-standard magnetic resonance quantification of hepatic fat. To overcome insufficient sample size, genome-wide association studies using NAFLD surrogate phenotypes have been used, but only a small number of loci have been identified to date. In this study, we combined GWAS of NAFLD composite surrogate phenotypes with genetic colocalization studies followed by functional in vitro screens to identify bona fide causal genes for NAFLD.We used the UK Biobank to explore the associations of our novel NAFLD score, and genetic colocalization to prioritize putative causal genes for in vitro validation. We created a functional genomic framework to study NAFLD genes in vitro using CRISPRi. Our data identify VKORC1, TNKS, LYPLAL1 and GPAM as regulators of lipid accumulation in hepatocytes and suggest the involvement of VKORC1 in the lipid storage related to the development of NAFLD.Complementary genetic and genomic approaches are useful for the identification of NAFLD genes. Our data supports VKORC1 as a bona fide NAFLD gene. We have established a functional genomic framework to study at scale putative novel NAFLD genes from human genetic association studies.

    View details for DOI 10.1101/2024.02.03.24302258

    View details for PubMedID 38352379

    View details for PubMedCentralID PMC10863038

  • Genome-Wide Genetic Associations Prioritize Evaluation of Causal Mechanisms of Atherosclerotic Disease Risk. Arteriosclerosis, thrombosis, and vascular biology Quertermous, T., Li, D. Y., Weldy, C. S., Ramste, M., Sharma, D., Monteiro, J. P., Gu, W., Worssam, M. D., Palmisano, B. T., Park, C. Y., Cheng, P. 2024; 44 (2): 323-327


    The goal of this review is to discuss the implementation of genome-wide association studies to identify causal mechanisms of vascular disease risk.The history of genome-wide association studies is described, the use of imputation and the creation of consortia to conduct meta-analyses with sufficient power to arrive at consistent associated loci for vascular disease. Genomic methods are described that allow the identification of causal variants and causal genes and how they impact the disease process. The power of single-cell analyses to promote genome-wide association studies of causal gene function is described.Genome-wide association studies represent a paradigm shift in the study of cardiovascular disease, providing identification of genes, cellular phenotypes, and disease pathways that empower the future of targeted drug development.

    View details for DOI 10.1161/ATVBAHA.123.319480

    View details for PubMedID 38266112

  • Decoding the genetic symphony: Profiling protein-coding and long noncoding RNA expression in T-acute lymphoblastic leukemia for clinical insights. PNAS nexus Verma, D., Kapoor, S., Kumari, S., Sharma, D., Singh, J., Benjamin, M., Bakhshi, S., Seth, R., Nayak, B., Sharma, A., Pramanik, R., Palanichamy, J. K., Sivasubbu, S., Scaria, V., Arora, M., Kumar, R., Chopra, A. 2024; 3 (2): pgae011


    T-acute lymphoblastic leukemia (T-ALL) is a heterogeneous malignancy characterized by the abnormal proliferation of immature T-cell precursors. Despite advances in immunophenotypic classification, understanding the molecular landscape and its impact on patient prognosis remains challenging. In this study, we conducted comprehensive RNA sequencing in a cohort of 35 patients with T-ALL to unravel the intricate transcriptomic profile. Subsequently, we validated the prognostic relevance of 23 targets, encompassing (i) protein-coding genes-BAALC, HHEX, MEF2C, FAT1, LYL1, LMO2, LYN, and TAL1; (ii) epigenetic modifiers-DOT1L, EP300, EML4, RAG1, EZH2, and KDM6A; and (iii) long noncoding RNAs (lncRNAs)-XIST, PCAT18, PCAT14, LINC00202, LINC00461, LINC00648, ST20, MEF2C-AS1, and MALAT1 in an independent cohort of 99 patients with T-ALL. Principal component analysis revealed distinct clusters aligning with immunophenotypic subtypes, providing insights into the molecular heterogeneity of T-ALL. The identified signature genes exhibited associations with clinicopathologic features. Survival analysis uncovered several independent predictors of patient outcomes. Higher expression of MEF2C, BAALC, HHEX, and LYL1 genes emerged as robust indicators of poor overall survival (OS), event-free survival (EFS), and relapse-free survival (RFS). Higher LMO2 expression was correlated with adverse EFS and RFS outcomes. Intriguingly, increased expression of lncRNA ST20 coupled with RAG1 demonstrated a favorable prognostic impact on OS, EFS, and RFS. Conclusively, several hitherto unreported associations of gene expression patterns with clinicopathologic features and prognosis were identified, which may help understand T-ALL's molecular pathogenesis and provide prognostic markers.

    View details for DOI 10.1093/pnasnexus/pgae011

    View details for PubMedID 38328782

    View details for PubMedCentralID PMC10847906

  • Comprehensive Integration of Multiple Single-Cell Transcriptomic Datasets Defines Distinct Cell Populations and Their Phenotypic Changes in Murine Atherosclerosis. Arteriosclerosis, thrombosis, and vascular biology Sharma, D., DeForest Worssam, M., Pedroza, A. J., Dalal, A. R., Alemany, H., Kim, H. J., Kundu, R., Fischbein, M., Cheng, P., Wirka, R., Quertermous, T. 2023


    The application of single-cell transcriptomic (single-cell RNA sequencing) analysis to the study of atherosclerosis has provided unique insights into the molecular and genetic mechanisms that mediate disease risk and pathophysiology. However, nonstandardized methodologies and relatively high costs associated with the technique have limited the size and replication of existing data sets and created disparate or contradictory findings that have fostered misunderstanding and controversy.To address these uncertainties, we have performed a conservative integration of multiple published single-cell RNA sequencing data sets into a single meta-analysis, performed extended analysis of native resident vascular cells, and used in situ hybridization to map the disease anatomic location of the identified cluster cells. To investigate the transdifferentiation of smooth muscle cells to macrophage phenotype, we have developed a classifying algorithm based on the quantification of reporter transgene expression.The reporter gene expression tool indicates that within the experimental limits of the examined studies, transdifferentiation of smooth muscle cell to the macrophage lineage is extremely rare. Validated transition smooth muscle cell phenotypes were defined by clustering, and the location of these cells was mapped to lesion anatomy with in situ hybridization. We have also characterized 5 endothelial cell phenotypes and linked these cellular species to different vascular structures and functions. Finally, we have identified a transcriptomically unique cellular phenotype that constitutes the aortic valve.Taken together, these analyses resolve a number of outstanding issues related to differing results reported with vascular disease single-cell RNA sequencing studies, and significantly extend our understanding of the role of resident vascular cells in anatomy and disease.

    View details for DOI 10.1161/ATVBAHA.123.320030

    View details for PubMedID 38152886

  • A single-cell CRISPRi platform for characterizing candidate genes relevant to metabolic disorders in human adipocytes. American journal of physiology. Cell physiology Bielczyk-Maczynska, E., Sharma, D., Blencowe, M., Saliba Gustafsson, P., Gloudemans, M. J., Yang, X., Carcamo-Orive, I., Wabitsch, M., Svensson, K. J., Park, C. Y., Quertermous, T., Knowles, J. W., Li, J. 2023


    CROP-Seq combines gene silencing using CRISPR interference with single-cell RNA sequencing. Here, we applied CROP-Seq to study adipogenesis and adipocyte biology. Human preadipocyte SGBS cell line expressing KRAB-dCas9 was transduced with a sgRNA library. Following selection, individual cells were captured using microfluidics at different timepoints during adipogenesis. Bioinformatic analysis of transcriptomic data was used to determine the knock-down effects, the dysregulated pathways, and to predict cellular phenotypes. Single-cell transcriptomes recapitulated adipogenesis states. For all targets, over 400 differentially expressed genes were identified at least at one timepoint. As a validation of our approach, the knock-down of PPARG and CEBPB (which encode key proadipogenic transcription factors) resulted in the inhibition of adipogenesis. Gene set enrichment analysis generated hypotheses regarding the molecular function of novel genes. MAFF knock-down led to downregulation of transcriptional response to proinflammatory cytokine TNF-α in preadipocytes and to decreased CXCL-16 and IL-6 secretion. TIPARP knock-down resulted in increased expression of adipogenesis markers. In summary, this powerful, hypothesis-free tool can identify novel regulators of adipogenesis, preadipocyte and adipocyte function associated with metabolic disease.

    View details for DOI 10.1152/ajpcell.00148.2023

    View details for PubMedID 37486064

  • Discovery of Transacting Long Noncoding RNAs That Regulate Smooth Muscle Cell Phenotype. Circulation research Shi, H., Nguyen, T., Zhao, Q., Cheng, P., Sharma, D., Kim, H. J., Brian Kim, J., Wirka, R., Weldy, C. S., Monteiro, J. P., Quertermous, T. 2023


    Smooth muscle cells (SMCs), the major cell type in atherosclerotic plaques, are vital in coronary artery diseases (CADs). Smooth muscle cell (SMC) phenotypic transition, which leads to the formation of various cell types in atherosclerotic plaques, is regulated by a network of genetic and epigenetic mechanisms and governs the risk of disease. The involvement of long noncoding RNAs (lncRNAs) has been increasingly identified in cardiovascular disease. However, SMC lncRNAs have not been comprehensively characterized, and their regulatory role in SMC state transition remains unknown.A discovery pipeline was constructed and applied to deeply strand-specific RNA sequencing from perturbed human coronary artery SMC with different disease-related stimuli, to allow for the detection of novel lncRNAs. The functional relevance of a select few novel lncRNAs were verified in vitro.We identified 4579 known and 13 655 de novo lncRNAs in human coronary artery SMC. Consistent with previous long noncoding RNA studies, these lncRNAs overall have fewer exons, are shorter in length than protein-coding genes (pcGenes), and have relatively low expression level. Genomic location of these long noncoding RNA is disproportionately enriched near CAD-related TFs (transcription factors), genetic loci, and gene regulators of SMC identity, suggesting the importance of their function in disease. Two de novo lncRNAs, ZEB-interacting suppressor (ZIPPOR) and TNS1-antisense (TNS1-AS2), were identified by our screen. Combining transcriptional data and in silico modeling along with in vitro validation, we identified CAD gene ZEB2 as a target through which these lncRNAs exert their function in SMC phenotypic transition.Expression of a large and diverse set of lncRNAs in human coronary artery SMC are highly dynamic in response to CAD-related stimuli. The dynamic changes in expression of these lncRNAs correspond to alterations in transcriptional programs that are relevant to CAD, suggesting a critical role for lncRNAs in SMC phenotypic transition and human atherosclerotic disease.

    View details for DOI 10.1161/CIRCRESAHA.122.321960

    View details for PubMedID 36852690

  • Whole-genome sequencing of 1029 Indian individuals reveals unique and rare structural variants. Journal of human genetics Divakar, M. K., Jain, A., Bhoyar, R. C., Senthivel, V., Jolly, B., Imran, M., Sharma, D., Bajaj, A., Gupta, V., Scaria, V., Sivasubbu, S. 2023


    Structural variants contribute to genetic variability in human genomes and they can be presented in population-specific patterns. We aimed to understand the landscape of structural variants in the genomes of healthy Indian individuals and explore their potential implications in genetic disease conditions. For the identification of structural variants, a whole genome sequencing dataset of 1029 self-declared healthy Indian individuals from the IndiGen project was analysed. Further, these variants were evaluated for potential pathogenicity and their associations with genetic diseases. We also compared our identified variations with the existing global datasets. We generated a compendium of total 38,560 high-confident structural variants, comprising 28,393 deletions, 5030 duplications, 5038 insertions, and 99 inversions. Particularly, we identified around 55% of all these variants were found to be unique to the studied population. Further analysis revealed 134 deletions with predicted pathogenic/likely pathogenic effects and their affected genes were majorly enriched for neurological disease conditions, such as intellectual disability and neurodegenerative diseases. The IndiGenomes dataset helped us to understand the unique spectrum of structural variants in the Indian population. More than half of identified variants were not present in the publicly available global dataset on structural variants. Clinically important deletions identified in IndiGenomes might aid in improving the diagnosis of unsolved genetic diseases, particularly in neurological conditions. Along with basal allele frequency data and clinically important deletions, IndiGenomes data might serve as a baseline resource for future studies on genomic structural variant analysis in the Indian population.

    View details for DOI 10.1038/s10038-023-01131-7

    View details for PubMedID 36813834

    View details for PubMedCentralID 7334194

  • Molecular mechanisms of coronary artery disease risk at the PDGFD locus. Nature communications Kim, H., Cheng, P., Travisano, S., Weldy, C., Monteiro, J. P., Kundu, R., Nguyen, T., Sharma, D., Shi, H., Lin, Y., Liu, B., Haldar, S., Jackson, S., Quertermous, T. 2023; 14 (1): 847


    Genome wide association studies for coronary artery disease (CAD) have identified a risk locus at 11q22.3. Here, we verify with mechanistic studies that rs2019090 and PDGFD represent the functional variant and gene at this locus. Further, FOXC1/C2 transcription factor binding at rs2019090 is shown to promote PDGFD transcription through the CAD promoting allele. With single cell transcriptomic and histology studies with Pdgfd knockdown in an SMC lineage tracing male atherosclerosis mouse model we find that Pdgfd promotes expansion, migration, and transition of SMC lineage cells to the chondromyocyte phenotype. Pdgfd also increases adventitial fibroblast and pericyte expression of chemokines and leukocyte adhesion molecules, which is linked to plaque macrophage recruitment. Despite these changes there is no effect of Pdgfd deletion on overall plaque burden. These findings suggest that PDGFD mediates CAD risk by promoting deleterious phenotypic changes in SMC, along with an inflammatory response that is primarily focused in the adventitia.

    View details for DOI 10.1038/s41467-023-36518-9

    View details for PubMedID 36792607

  • Molecular mechanisms of coronary artery disease risk at the PDGFD locus. bioRxiv : the preprint server for biology Kim, H., Cheng, P., Travisano, S., Weldy, C., Monteiro, J. O., Kundu, R., Nguyen, T., Sharma, D., Shi, H., Lin, Y., Liu, B., Haldar, S., Jackson, S., Quertermous, T. 2023


    Platelet derived growth factor (PDGF) signaling has been extensively studied in the context of vascular disease, but the genetics of this pathway remain to be established. Genome wide association studies (GWAS) for coronary artery disease (CAD) have identified a risk locus at 11q22.3, and we have verified with fine mapping approaches that the regulatory variant rs2019090 and PDGFD represent the functional variant and putative functional gene. Further, FOXC1/C2 transcription factor (TF) binding at rs2019090 was found to promote PDGFD transcription through the CAD promoting allele. Employing a constitutive Pdgfd knockout allele along with SMC lineage tracing in a male atherosclerosis mouse model we mapped single cell transcriptomic, cell state, and lesion anatomical changes associated with gene loss. These studies revealed that Pdgfd promotes expansion, migration, and transition of SMC lineage cells to the chondromyocyte phenotype and vascular calcification. This is in contrast to protective CAD genes TCF21 , ZEB2 , and SMAD3 which we have shown to promote the fibroblast-like cell transition or perturb the pattern or extent of transition to the chondromyocyte phenotype. Further, Pdgfd expressing fibroblasts and pericytes exhibited greater expression of chemokines and leukocyte adhesion molecules, consistent with observed increased macrophage recruitment to the plaque. Despite these changes there was no effect of Pdgfd deletion on SMC contribution to the fibrous cap or overall lesion burden. These findings suggest that PDGFD mediates CAD risk through promoting SMC expansion and migration, in conjunction with deleterious phenotypic changes, and through promoting an inflammatory response that is primarily focused in the adventitia where it contributes to leukocyte trafficking to the diseased vessel wall.

    View details for DOI 10.1101/2023.01.26.525789

    View details for PubMedID 36747745

  • 1029 genomes of self-declared healthy individuals from India reveal prevalent and clinically relevant cardiac ion channelopathy variants. Human genomics Bajaj, A., Senthivel, V., Bhoyar, R., Jain, A., Imran, M., Rophina, M., Divakar, M. K., Jolly, B., Verma, A., Mishra, A., Sharma, D., Deepti, S., Sharma, G., Bansal, R., Yadav, R., Scaria, V., Naik, N., Sivasubbu, S. 2022; 16 (1): 30


    The prevalence and genetic spectrum of cardiac channelopathies exhibit population-specific differences. We aimed to understand the spectrum of cardiac channelopathy-associated variations in India, which is characterised by a genetically diverse population and is largely understudied in the context of these disorders.We utilised the IndiGenomes dataset comprising 1029 whole genomes from self-declared healthy individuals as a template to filter variants in 36 genes known to cause cardiac channelopathies. Our analysis revealed 186,782 variants, of which we filtered 470 variants that were identified as possibly pathogenic (440 nonsynonymous, 30 high-confidence predicted loss of function ). About 26% (124 out of 470) of these variants were unique to the Indian population as they were not reported in the global population datasets and published literature. Classification of 470 variants by ACMG/AMP guidelines unveiled 13 pathogenic/likely pathogenic (P/LP) variants mapping to 19 out of the 1029 individuals. Further query of 53 probands in an independent cohort of cardiac channelopathy, using exome sequencing, revealed the presence of 3 out of the 13 P/LP variants. The identification of p.G179Sfs*62, p.R823W and c.420 + 2 T > C variants in KCNQ1, KCNH2 and CASQ2 genes, respectively, validate the significance of the P/LP variants in the context of clinical applicability as well as for large-scale population analysis.A compendium of ACMG/AMP classified cardiac channelopathy variants in 1029 self-declared healthy Indian population was created. A conservative genotypic prevalence was estimated to be 0.9-1.8% which poses a huge public health burden for a country with large population size like India. In the majority of cases, these disorders are manageable and the risk of sudden cardiac death can be alleviated by appropriate lifestyle modifications as well as treatment regimens/clinical interventions. Clinical utility of the obtained variants was demonstrated using a cardiac channelopathy patient cohort. Our study emphasises the need for large-scale population screening to identify at-risk individuals and take preventive measures. However, we suggest cautious clinical interpretation to be exercised by taking other cardiac channelopathy risk factors into account.

    View details for DOI 10.1186/s40246-022-00402-2

    View details for PubMedID 35932045

    View details for PubMedCentralID PMC9354277

  • Landscape of Variability in Chemosensory Genes Associated With Dietary Preferences in Indian Population: Analysis of 1029 Indian Genomes. Frontiers in genetics Prakrithi, P., Jha, P., Jaiswal, J., Sharma, D., Bhoyar, R. C., Jain, A., Imran, M., Senthilvel, V., Divakar, M. K., Mishra, A., Scaria, V., Sivasubbu, S., Mukerji, M. 2022; 13: 878134


    Perception and preferences for food and beverages determine dietary behaviour and health outcomes. Inherent differences in chemosensory genes, ethnicity, geo-climatic conditions, and sociocultural practices are other determinants. We aimed to study the variation landscape of chemosensory genes involved in perception of taste, texture, odour, temperature and burning sensations through analysis of 1,029 genomes of the IndiGen project and diverse continental populations. SNPs from 80 chemosensory genes were studied in whole genomes of 1,029 IndiGen samples and 2054 from the 1000 Genomes project. Population genetics approaches were used to infer ancestry of IndiGen individuals, gene divergence and extent of differentiation among studied populations. 137,760 SNPs including common and rare variants were identified in IndiGenomes with 62,950 novel (46%) and 48% shared with the 1,000 Genomes. Genes associated with olfaction harbored most SNPs followed by those associated with differences in perception of salt and pungent tastes. Across species, receptors for bitter taste were the most diverse compared to others. Three predominant ancestry groups within IndiGen were identified based on population structure analysis. We also identified 1,184 variants that exhibit differences in frequency of derived alleles and high population differentiation (FST ≥0.3) in Indian populations compared to European, East Asian and African populations. Examples include ADCY10, TRPV1, RGS6, OR7D4, ITPR3, OPRM1, TCF7L2, and RUNX1. This is a first of its kind of study on baseline variations in genes that could govern cuisine designs, dietary preferences and health outcomes. This would be of enormous utility in dietary recommendations for precision nutrition both at population and individual level.

    View details for DOI 10.3389/fgene.2022.878134

    View details for PubMedID 35903357

    View details for PubMedCentralID PMC9315315

  • Comprehensive Assessment of Indian Variations in the Druggable Kinome Landscape Highlights Distinct Insights at the Sequence, Structure and Pharmacogenomic Stratum. Frontiers in pharmacology Panda, G., Mishra, N., Sharma, D., Kutum, R., Bhoyar, R. C., Jain, A., Imran, M., Senthilvel, V., Divakar, M. K., Mishra, A., Garg, P., Banerjee, P., Sivasubbu, S., Scaria, V., Ray, A. 2022; 13: 858345


    India confines more than 17% of the world's population and has a diverse genetic makeup with several clinically relevant rare mutations belonging to many sub-group which are undervalued in global sequencing datasets like the 1000 Genome data (1KG) containing limited samples for Indian ethnicity. Such databases are critical for the pharmaceutical and drug development industry where diversity plays a crucial role in identifying genetic disposition towards adverse drug reactions. A qualitative and comparative sequence and structural study utilizing variant information present in the recently published, largest curated Indian genome database (IndiGen) and the 1000 Genome data was performed for variants belonging to the kinase coding genes, the second most targeted group of drug targets. The sequence-level analysis identified similarities and differences among different populations based on the nsSNVs and amino acid exchange frequencies whereas a comparative structural analysis of IndiGen variants was performed with pathogenic variants reported in UniProtKB Humsavar data. The influence of these variations on structural features of the protein, such as structural stability, solvent accessibility, hydrophobicity, and the hydrogen-bond network was investigated. In-silico screening of the known drugs to these Indian variation-containing proteins reveals critical differences imparted in the strength of binding due to the variations present in the Indian population. In conclusion, this study constitutes a comprehensive investigation into the understanding of common variations present in the second largest population in the world and investigating its implications in the sequence, structural and pharmacogenomic landscape. The preliminary investigation reported in this paper, supporting the screening and detection of ADRs specific to the Indian population could aid in the development of techniques for pre-clinical and post-market screening of drug-related adverse events in the Indian population.

    View details for DOI 10.3389/fphar.2022.858345

    View details for PubMedID 35865963

    View details for PubMedCentralID PMC9294532

  • Smad3 regulates smooth muscle cell fate and mediates adverse remodeling and calcification of the atherosclerotic plaque. Nature cardiovascular research Cheng, P., Wirka, R. C., Kim, J. B., Kim, H. J., Nguyen, T., Kundu, R., Zhao, Q., Sharma, D., Pedroza, A., Nagao, M., Iyer, D., Fischbein, M. P., Quertermous, T. 2022; 1 (4): 322-333


    Atherosclerotic plaques consist mostly of smooth muscle cells (SMC), and genes that influence SMC phenotype can modulate coronary artery disease (CAD) risk. Allelic variation at 15q22.33 has been identified by genome-wide association studies to modify the risk of CAD and is associated with the expression of SMAD3 in SMC. However, the mechanism by which this gene modifies CAD risk remains poorly understood. Here we show that SMC-specific deletion of Smad3 in a murine atherosclerosis model resulted in greater plaque burden, more outward remodelling and increased vascular calcification. Single-cell transcriptomic analyses revealed that loss of Smad3 altered SMC transition cell state toward two fates: a SMC phenotype that governs both vascular remodelling and recruitment of inflammatory cells, as well as a chondromyocyte fate. Together, the findings reveal that Smad3 expression in SMC inhibits the emergence of specific SMC phenotypic transition cells that mediate adverse plaque features, including outward remodelling, monocyte recruitment, and vascular calcification.

    View details for DOI 10.1038/s44161-022-00042-8

    View details for PubMedID 36246779

    View details for PubMedCentralID PMC9560061

  • Pharmacogenomic landscape of Indian population using whole genomes. Clinical and translational science Sahana, S., Bhoyar, R. C., Sivadas, A., Jain, A., Imran, M., Rophina, M., Senthivel, V., Kumar Diwakar, M., Sharma, D., Mishra, A., Sivasubbu, S., Scaria, V. 2022


    Ethnic differences in pharmacogenomic (PGx) variants have been well documented in literature and could significantly impact variability in response and adverse events to therapeutics. India is a large country with diverse ethnic populations of distinct genetic architecture. India's national genome sequencing initiative (IndiGen) provides a unique opportunity to explore the landscape of PGx variants using population-scale whole genome sequences. We have analyzed the IndiGen variation dataset (N = 1029 genomes) along with global population scale databases to map the most prevalent clinically actionable and potentially deleterious PGx variants among Indians. Differential frequencies for the known and novel variants were studied and interaction of the disrupted PGx genes affecting drug responses were analyzed by performing a pathway analysis. We have highlighted significant differences in the allele frequencies of clinically actionable PGx variants in Indians when compared to the global populations. We identified 134 mostly common (allele frequency [AF] > 0.1) potentially deleterious PGx variants that could alter or inhibit the function of 102 pharmacogenes in Indians. We also estimate that on, an average, each Indian individual carried eight PGx variants (single nucleotide variants) that have a direct impact on the choice of treatment or drug dosing. We have also highlighted clinically actionable PGx variants and genes for which preemptive genotyping is most recommended for the Indian population. The study has put forward the most comprehensive PGx landscape of the Indian population from whole genomes that could enable optimized drug selection and genotype-guided prescriptions for improved therapeutic outcomes and minimizing adverse events.

    View details for DOI 10.1111/cts.13153

    View details for PubMedID 35338580

  • An Alu insertion map of the Indian population: identification and analysis in 1021 genomes of the IndiGen project. NAR genomics and bioinformatics Prakrithi, P., Singhal, K., Sharma, D., Jain, A., Bhoyar, R. C., Imran, M., Senthilvel, V., Divakar, M. K., Mishra, A., Scaria, V., Sivasubbu, S., Mukerji, M. 2022; 4 (1): lqac009


    Actively retrotransposing primate-specific Alu repeats display insertion-deletion (InDel) polymorphism through their insertion at new loci. In the global datasets, Indian populations remain under-represented and so do their Alu InDels. Here, we report the genomic landscape of Alu InDels from the recently released 1021 Indian Genomes (IndiGen) (available at We identified 9239 polymorphic Alu insertions that include private (3831), rare (3974) and common (1434) insertions with an average of 770 insertions per individual. We achieved an 89% PCR validation of the predicted genotypes in 94 samples tested. About 60% of identified InDels are unique to IndiGen when compared to other global datasets; 23% of sites were shared with both SGDP and HGSVC; among these, 58% (1289 sites) were common polymorphisms in IndiGen. The insertions not only show a bias for genic regions, with a preference for introns but also for the associated genes showing enrichment for processes like cell morphogenesis and neurogenesis (P-value < 0.05). Approximately, 60% of InDels mapped to genes present in the OMIM database. Finally, we show that 558 InDels can serve as ancestry informative markers to segregate global populations. This study provides a valuable resource for baseline Alu InDels that would be useful in population genomics.

    View details for DOI 10.1093/nargab/lqac009

    View details for PubMedID 35178516

    View details for PubMedCentralID PMC8846365

  • ZEB2 Shapes the Epigenetic Landscape of Atherosclerosis Circulation Cheng, P., Wirka, R. C., Clarke, L., Zhao, Q., Kundu, R., Nguyen, T., Nair, S., Sharma, D., Kim, H., Shi, H., Assimes, T., Kim, J., Kundaje, A., Quertermous, T. 2022; 145 (6): 469–485


    Background: Smooth muscle cells (SMC) transition into a number of different phenotypes during atherosclerosis, including those that resemble fibroblasts and chondrocytes, and make up the majority of cells in the atherosclerotic plaque. To better understand the epigenetic and transcriptional mechanisms that mediate these cell state changes, and how they relate to risk for coronary artery disease (CAD), we have investigated the causality and function of transcription factors (TFs) at genome wide associated loci. Methods: We employed CRISPR-Cas 9 genome and epigenome editing to identify the causal gene and cell(s) for a complex CAD GWAS signal at 2q22.3. Subsequently, single-cell epigenetic and transcriptomic profiling in murine models and human coronary artery smooth muscle cells were employed to understand the cellular and molecular mechanism by which this CAD risk gene exerts its function. Results: CRISPR-Cas 9 genome and epigenome editing showed that the complex CAD genetic signals within a genomic region at 2q22.3 lie within smooth muscle long-distance enhancers for ZEB2, a TF extensively studied in the context of epithelial mesenchymal transition (EMT) in development and cancer. ZEB2 regulates SMC phenotypic transition through chromatin remodeling that obviates accessibility and disrupts both Notch and TGFβ signaling, thus altering the epigenetic trajectory of SMC transitions. SMC specific loss of ZEB2 resulted in an inability of transitioning SMCs to turn off contractile programing and take on a fibroblast-like phenotype, but accelerated the formation of chondromyocytes, mirroring features of high-risk atherosclerotic plaques in human coronary arteries. Conclusions: These studies identify ZEB2 as a new CAD GWAS gene that affects features of plaque vulnerability through direct effects on the epigenome, providing a new thereapeutic approach to target vascular disease.

    View details for DOI 10.1161/CIRCULATIONAHA.121.057789

  • ZEB2 Shapes the Epigenetic Landscape of Atherosclerosis. Circulation Cheng, P., Wirka, R. C., Clarke, L. S., Zhao, Q., Kundu, R., Nguyen, T., Nair, S., Sharma, D., Kim, H. J., Shi, H., Assimes, T., Kim, J. B., Kundaje, A., Quertermous, T. 2022


    Background: Smooth muscle cells (SMC) transition into a number of different phenotypes during atherosclerosis, including those that resemble fibroblasts and chondrocytes, and make up the majority of cells in the atherosclerotic plaque. To better understand the epigenetic and transcriptional mechanisms that mediate these cell state changes, and how they relate to risk for coronary artery disease (CAD), we have investigated the causality and function of transcription factors (TFs) at genome wide associated loci. Methods: We employed CRISPR-Cas 9 genome and epigenome editing to identify the causal gene and cell(s) for a complex CAD GWAS signal at 2q22.3. Subsequently, single-cell epigenetic and transcriptomic profiling in murine models and human coronary artery smooth muscle cells were employed to understand the cellular and molecular mechanism by which this CAD risk gene exerts its function. Results: CRISPR-Cas 9 genome and epigenome editing showed that the complex CAD genetic signals within a genomic region at 2q22.3 lie within smooth muscle long-distance enhancers for ZEB2, a TF extensively studied in the context of epithelial mesenchymal transition (EMT) in development and cancer. ZEB2 regulates SMC phenotypic transition through chromatin remodeling that obviates accessibility and disrupts both Notch and TGFβ signaling, thus altering the epigenetic trajectory of SMC transitions. SMC specific loss of ZEB2 resulted in an inability of transitioning SMCs to turn off contractile programing and take on a fibroblast-like phenotype, but accelerated the formation of chondromyocytes, mirroring features of high-risk atherosclerotic plaques in human coronary arteries. Conclusions: These studies identify ZEB2 as a new CAD GWAS gene that affects features of plaque vulnerability through direct effects on the epigenome, providing a new thereapeutic approach to target vascular disease.

    View details for DOI 10.1161/CIRCULATIONAHA.121.057789

    View details for PubMedID 34990206

  • Genetic epidemiology of autoinflammatory disease variants in Indian population from 1029 whole genomes. Journal, genetic engineering & biotechnology Jain, A., Bhoyar, R. C., Pandhare, K., Mishra, A., Sharma, D., Imran, M., Senthivel, V., Divakar, M. K., Rophina, M., Jolly, B., Batra, A., Sharma, S., Siwach, S., Jadhao, A. G., Palande, N. V., Jha, G. N., Ashrafi, N., Mishra, P. K., A K, V., Jain, S., Dash, D., Kumar, N. S., Vanlallawma, A., Sarma, R. J., Chhakchhuak, L., Kalyanaraman, S., Mahadevan, R., Kandasamy, S., B M, P., Rajagopal, R. E., Ramya J, E., Devi P, N., Bajaj, A., Gupta, V., Mathew, S., Goswami, S., Mangla, M., Prakash, S., Joshi, K., S, S., Gajjar, D., Soraisham, R., Yadav, R., Devi, Y. S., Gupta, A., Mukerji, M., Ramalingam, S., B K, B., Scaria, V., Sivasubbu, S. 2021; 19 (1): 183


    Autoinflammatory disorders are the group of inherited inflammatory disorders caused due to the genetic defect in the genes that regulates innate immune systems. These have been clinically characterized based on the duration and occurrence of unprovoked fever, skin rash, and patient's ancestry. There are several autoinflammatory disorders that are found to be prevalent in a specific population and whose disease genetic epidemiology within the population has been well understood. However, India has a limited number of genetic studies reported for autoinflammatory disorders till date. The whole genome sequencing and analysis of 1029 Indian individuals performed under the IndiGen project persuaded us to perform the genetic epidemiology of the autoinflammatory disorders in India.We have systematically annotated the genetic variants of 56 genes implicated in autoinflammatory disorder. These genetic variants were reclassified into five categories (i.e., pathogenic, likely pathogenic, benign, likely benign, and variant of uncertain significance (VUS)) according to the American College of Medical Genetics and Association of Molecular pathology (ACMG-AMP) guidelines. Our analysis revealed 20 pathogenic and likely pathogenic variants with significant differences in the allele frequency compared with the global population. We also found six causal founder variants in the IndiGen dataset belonging to different ancestry. We have performed haplotype prediction analysis for founder mutations haplotype that reveals the admixture of the South Asian population with other populations. The cumulative carrier frequency of the autoinflammatory disorder in India was found to be 3.5% which is much higher than reported.With such frequency in the Indian population, there is a great need for awareness among clinicians as well as the general public regarding the autoinflammatory disorder. To the best of our knowledge, this is the first and most comprehensive population scale genetic epidemiological study being reported from India.

    View details for DOI 10.1186/s43141-021-00268-2

    View details for PubMedID 34905135

  • Asymptomatic reactivation of SARS-CoV-2 in a child with neuroblastoma characterised by whole genome sequencing. IDCases Yadav, S. P., Thakkar, D., Bhoyar, R. C., Jain, A., Wadhwa, T., Imran, M., Jolly, B., Divakar, M. K., Kapoor, R., Rastogi, N., Sharma, D., Sehgal, P., Ranjan, G., Sivasubbu, S., Sarma, S., Scaria, V. 2021; 23: e01018

    View details for DOI 10.1016/j.idcr.2020.e01018

    View details for PubMedID 33288996

    View details for PubMedCentralID PMC7711173

  • Functional long non-coding and circular RNAs in zebrafish. Briefings in functional genomics Ranjan, G., Sehgal, P., Sharma, D., Scaria, V., Sivasubbu, S. 2021


    The utility of model organisms to understand the function of a novel transcript/genes has allowed us to delineate their molecular mechanisms in maintaining cellular homeostasis. Organisms such as zebrafish have contributed a lot in the field of developmental and disease biology. Attributable to advancement and deep transcriptomics, many new transcript isoforms and non-coding RNAs such as long noncoding RNA (lncRNA) and circular RNAs (circRNAs) have been identified and cataloged in multiple databases and many more are yet to be identified. Various methods and tools have been utilized to identify lncRNAs/circRNAs in zebrafish using deep sequencing of transcriptomes as templates. Functional analysis of a few candidates such as tie1-AS, ECAL1 and CDR1as in zebrafish provides a prospective outline to approach other known or novel lncRNA/circRNA. New genetic alteration tools like TALENS and CRISPRs have helped in probing for the molecular function of lncRNA/circRNA in zebrafish. Further latest improvements in experimental and computational techniques offer the identification of lncRNA/circRNA counterparts in humans and zebrafish thereby allowing easy modeling and analysis of function at cellular level.

    View details for DOI 10.1093/bfgp/elab014

    View details for PubMedID 33755040

  • Initial Insights Into the Genetic Epidemiology of SARS-CoV-2 Isolates From Kerala Suggest Local Spread From Limited Introductions FRONTIERS IN GENETICS Radhakrishnan, C., Divakar, M., Jain, A., Viswanathan, P., Bhoyar, R. C., Jolly, B., Imran, M., Sharma, D., Rophina, M., Ranjan, G., Sehgal, P., Jose, B., Raman, R., Kesavan, T., George, K., Mathew, S., Poovullathil, J., Keeriyatt Govindan, S., Nair, P., Vadekkandiyil, S., Gladson, V., Mohan, M., Parambath, F., Mangla, M., Shamnath, A., Sivasubbu, S., Scaria, V., Indian CoV2 Genomics Genetic 2021; 12: 630542


    Coronavirus disease 2019 (COVID-19) rapidly spread from a city in China to almost every country in the world, affecting millions of individuals. The rapid increase in the COVID-19 cases in the state of Kerala in India has necessitated the understanding of SARS-CoV-2 genetic epidemiology. We sequenced 200 samples from patients in Kerala using COVIDSeq protocol amplicon-based sequencing. The analysis identified 166 high-quality single-nucleotide variants encompassing four novel variants and 89 new variants in the Indian isolated SARS-CoV-2. Phylogenetic and haplotype analysis revealed that the virus was dominated by three distinct introductions followed by local spread suggesting recent outbreaks and that it belongs to the A2a clade. Further analysis of the functional variants revealed that two variants in the S gene associated with increased infectivity and five variants mapped in primer binding sites affect the efficacy of RT-PCR. To the best of our knowledge, this is the first and most comprehensive report of SARS-CoV-2 genetic epidemiology from Kerala.

    View details for DOI 10.3389/fgene.2021.630542

    View details for Web of Science ID 000635122300001

    View details for PubMedID 33815467

    View details for PubMedCentralID PMC8010186

  • Founder variants and population genomes-Toward precision medicine. Advances in genetics Jain, A., Sharma, D., Bajaj, A., Gupta, V., Scaria, V. 2021; 107: 121-152


    Human migration and community specific cultural practices have contributed to founder events and enrichment of the variants associated with genetic diseases. While many founder events in isolated populations have remained uncharacterized, the application of genomics in clinical settings as well as for population scale studies in the recent years have provided an unprecedented push towards identification of founder variants associated with human health and disease. The discovery and characterization of founder variants could have far reaching implications not only in understanding the history or genealogy of the disease, but also in implementing evidence based policies and genetic testing frameworks. This further enables precise diagnosis and prevention in an attempt towards precision medicine. This review provides an overview of founder variants along with methods and resources cataloging them. We have also discussed the public health implications and examples of prevalent disease associated founder variants in specific populations.

    View details for DOI 10.1016/bs.adgen.2020.11.004

    View details for PubMedID 33641745

  • High throughput detection and genetic epidemiology of SARS-CoV-2 using COVIDSeq next-generation sequencing. PloS one Bhoyar, R. C., Jain, A., Sehgal, P., Divakar, M. K., Sharma, D., Imran, M., Jolly, B., Ranjan, G., Rophina, M., Sharma, S., Siwach, S., Pandhare, K., Sahoo, S., Sahoo, M., Nayak, A., Mohanty, J. N., Das, J., Bhandari, S., Mathur, S. K., Kumar, A., Sahlot, R., Rojarani, P., Lakshmi, J. V., Surekha, A., Sekhar, P. C., Mahajan, S., Masih, S., Singh, P., Kumar, V., Jose, B., Mahajan, V., Gupta, V., Gupta, R., Arumugam, P., Singh, A., Nandy, A., P V, R., Jha, R. M., Kumari, A., Gandotra, S., Rao, V., Faruq, M., Kumar, S., Reshma G, B., Varma G, N., Roy, S. S., Sengupta, A., Chattopadhyay, S., Singhal, K., Pradhan, S., Jha, D., Naushin, S., Wadhwa, S., Tyagi, N., Poojary, M., Scaria, V., Sivasubbu, S. 2021; 16 (2): e0247115


    The rapid emergence of coronavirus disease 2019 (COVID-19) as a global pandemic affecting millions of individuals globally has necessitated sensitive and high-throughput approaches for the diagnosis, surveillance, and determining the genetic epidemiology of SARS-CoV-2. In the present study, we used the COVIDSeq protocol, which involves multiplex-PCR, barcoding, and sequencing of samples for high-throughput detection and deciphering the genetic epidemiology of SARS-CoV-2. We used the approach on 752 clinical samples in duplicates, amounting to a total of 1536 samples which could be sequenced on a single S4 sequencing flow cell on NovaSeq 6000. Our analysis suggests a high concordance between technical duplicates and a high concordance of detection of SARS-CoV-2 between the COVIDSeq as well as RT-PCR approaches. An in-depth analysis revealed a total of six samples in which COVIDSeq detected SARS-CoV-2 in high confidence which were negative in RT-PCR. Additionally, the assay could detect SARS-CoV-2 in 21 samples and 16 samples which were classified inconclusive and pan-sarbeco positive respectively suggesting that COVIDSeq could be used as a confirmatory test. The sequencing approach also enabled insights into the evolution and genetic epidemiology of the SARS-CoV-2 samples. The samples were classified into a total of 3 clades. This study reports two lineages B.1.112 and B.1.99 for the first time in India. This study also revealed 1,143 unique single nucleotide variants and added a total of 73 novel variants identified for the first time. To the best of our knowledge, this is the first report of the COVIDSeq approach for detection and genetic epidemiology of SARS-CoV-2. Our analysis suggests that COVIDSeq could be a potential high sensitivity assay for the detection of SARS-CoV-2, with an additional advantage of enabling the genetic epidemiology of SARS-CoV-2.

    View details for DOI 10.1371/journal.pone.0247115

    View details for PubMedID 33596239

    View details for PubMedCentralID PMC7888613

  • IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes. Nucleic acids research Jain, A., Bhoyar, R. C., Pandhare, K., Mishra, A., Sharma, D., Imran, M., Senthivel, V., Divakar, M. K., Rophina, M., Jolly, B., Batra, A., Sharma, S., Siwach, S., Jadhao, A. G., Palande, N. V., Jha, G. N., Ashrafi, N., Mishra, P. K., A K, V., Jain, S., Dash, D., Kumar, N. S., Vanlallawma, A., Sarma, R. J., Chhakchhuak, L., Kalyanaraman, S., Mahadevan, R., Kandasamy, S., B M, P., Rajagopal, R. E., J, E. R., P, N. D., Bajaj, A., Gupta, V., Mathew, S., Goswami, S., Mangla, M., Prakash, S., Joshi, K., S, S., Gajjar, D., Soraisham, R., Yadav, R., Devi, Y. S., Gupta, A., Mukerji, M., Ramalingam, S., B K, B., Scaria, V., Sivasubbu, S. 2021; 49 (D1): D1225-D1232


    With the advent of next-generation sequencing, large-scale initiatives for mining whole genomes and exomes have been employed to better understand global or population-level genetic architecture. India encompasses more than 17% of the world population with extensive genetic diversity, but is under-represented in the global sequencing datasets. This gave us the impetus to perform and analyze the whole genome sequencing of 1029 healthy Indian individuals under the pilot phase of the 'IndiGen' program. We generated a compendium of 55,898,122 single allelic genetic variants from geographically distinct Indian genomes and calculated the allele frequency, allele count, allele number, along with the number of heterozygous or homozygous individuals. In the present study, these variants were systematically annotated using publicly available population databases and can be accessed through a browsable online database named as 'IndiGenomes' The IndiGenomes database will help clinicians and researchers in exploring the genetic component underlying medical conditions. Till date, this is the most comprehensive genetic variant resource for the Indian population and is made freely available for academic utility. The resource has also been accessed extensively by the worldwide community since it's launch.

    View details for DOI 10.1093/nar/gkaa923

    View details for PubMedID 33095885

    View details for PubMedCentralID PMC7778947

  • A genome-wide circular RNA transcriptome in rat. Biology methods & protocols Sharma, D., Sehgal, P., Sivasubbu, S., Scaria, V. 2021; 6 (1): bpab016


    Circular RNAs (circRNAs) are a novel class of noncoding RNAs that back-splice from 5' donor site and 3' acceptor sites to form a circular structure. A number of circRNAs have been discovered in model organisms including human, mouse, Drosophila, among other organisms. There are a few candidate-based studies on circRNAs in rat, a well-studied model organism as well. A number of pipelines have been published to identify the back splice junctions for the discovery of circRNAs but studies comparing these tools have suggested that a combination of tools would be a better approach to identify high-confidence circRNAs. The availability of a recent dataset of transcriptomes encompassing 11 tissues, 4 developmental stages, and 2 genders motivated us to explore the landscape of circRNAs in the organism in this context. In order to understand the difference among different pipelines, we employed five different combinations of tools to identify circular RNAs from the dataset. We compared the results of the different combination of tools/pipelines with respect to alignment, total number of circRNAs identified and read-coverage. In addition, we identified tissue-specific, development-stage specific and gender-specific circRNAs and further independently validated 16 circRNA junctions out of 24 selected candidates in 5 tissue samples and estimated the quantitative expression of five circRNA candidates using real-time polymerase chain reaction and our analysis suggests three candidates as tissue-enriched. This study is one of the most comprehensive studies which provides a map of circRNAs transcriptome as well as to understand the difference among different computational pipelines in rat.

    View details for DOI 10.1093/biomethods/bpab016

    View details for PubMedID 34527809

    View details for PubMedCentralID PMC8435660

  • Asymptomatic reinfection in two healthcare workers from India with genetically distinct SARS-CoV-2. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America Gupta, V., Bhoyar, R. C., Jain, A., Srivastava, S., Upadhayay, R., Imran, M., Jolly, B., Divakar, M. K., Sharma, D., Sehgal, P., Ranjan, G., Gupta, R., Scaria, V., Sivasubbu, S. 2020

    View details for DOI 10.1093/cid/ciaa1451

    View details for PubMedID 32964927

    View details for PubMedCentralID PMC7543380

  • Saliva microbiome in primary Sjögren's syndrome reveals distinct set of disease-associated microbes. Oral diseases Sharma, D., Sandhya, P., Vellarikkal, S. K., Surin, A. K., Jayarajan, R., Verma, A., Kumar, A., Ravi, R., Danda, D., Sivasubbu, S., Scaria, V. 2020; 26 (2): 295-301


    This study systematically aims to evaluate the salivary microbiome in patients with primary Sjögren's syndrome (pSS) using 16S rRNA sequencing approach.DNA isolation and 16S rRNA sequencing was performed on saliva of 37 pSS and 35 control (CC) samples on HiSeq 2500 platform. 16S rRNA sequence analysis was performed independently using two popular computational pipelines, QIIME and less operational taxonomic units scripts (LoTuS).There were no significant changes in the alpha diversity between saliva of patients and controls. However, four genera including Bifidobacterium, Lactobacillus, Dialister and Leptotrichia were found to be differential between the two sets, and common between both QIIME and LoTuS analysis pipelines (Fold change of 2 and p < .05). Bifidobacterium, Dialister and Lactobacillus were found to be enriched, while Leptotrichia was significantly depleted in pSS compared to the controls. Exploration of microbial diversity measures (Chao1, observed species and Shannon index) revealed a significant increase in the diversity in patients with renal tubular acidosis. An opposite trend was noted, with depletion of diversity in patients with steroids.Our analysis suggests that while no significant changes in the diversity of the salivary microbiome could be observed in Sjögren's syndrome compared to the controls, a set of four genera were significantly and consistently differential in the saliva of patients with pSS. Additionally, a difference in alpha diversity in patients with renal tubular acidosis and those on steroids was observed.

    View details for DOI 10.1111/odi.13191

    View details for PubMedID 31514257

  • Circad: a comprehensive manually curated resource of circular RNA associated with diseases. Database : the journal of biological databases and curation Rophina, M., Sharma, D., Poojary, M., Scaria, V. 2020; 2020


    Circular RNAs (circRNAs) are unique transcript isoforms characterized by back splicing of exon ends to form a covalently closed loop or circular conformation. These transcript isoforms are now known to be expressed in a variety of organisms across the kingdoms of life. Recent studies have shown the role of circRNAs in a number of diseases and increasing evidence points to their potential application as biomarkers in these diseases. We have created a comprehensive manually curated database of circular RNAs associated with diseases. This database is available at URL The Database lists more than 1300 circRNAs associated with 150 diseases and mapping to 113 International Statistical Classification of Diseases (ICD) codes with evidence of association linked to published literature. The database is unique in many ways. Firstly, it provides ready-to-use primers to work with, in order to use circRNAs as biomarkers or to perform functional studies. It additionally lists the assay and PCR primer details including experimentally validated ones as a ready reference to researchers along with fold change and statistical significance. It also provides standard disease nomenclature as per the ICD codes. To the best of our knowledge, circad is the most comprehensive and updated database of disease associated circular RNAs.Availability:

    View details for DOI 10.1093/database/baaa019

    View details for PubMedID 32219412

    View details for PubMedCentralID PMC7100626

  • Genomics of rare genetic diseases-experiences from India HUMAN GENOMICS Sivasubbu, S., Scaria, V., GUaRDIAN Consortium 2019; 13 (1): 52


    Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India.Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socio-economic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nation-wide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma.In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nation-wide framework to cater to the rare disease community of India.

    View details for DOI 10.1186/s40246-019-0215-5

    View details for Web of Science ID 000514921600001

    View details for PubMedID 31554517

    View details for PubMedCentralID PMC6760067

  • Organellar transcriptome sequencing reveals mitochondrial localization of nuclear encoded transcripts. Mitochondrion Sabharwal, A., Sharma, D., Vellarikkal, S. K., Jayarajan, R., Verma, A., Senthivel, V., Scaria, V., Sivasubbu, S. 2019; 46: 59-68


    Mitochondria are organelles involved in a variety of biological functions in the cell, apart from their principal role in generation of ATP, the cellular currency of energy. The mitochondria, in spite of being compact organelles, are capable of performing complex biological functions largely because of the ability to exchange proteins, RNA, chemical metabolites and other biomolecules between cellular compartments. A close network of biomolecular interactions are known to modulate the crosstalk between the mitochondria and the nuclear genome. Apart from the small repertoire of genes encoded by the mitochondrial genome, it is now known that the functionality of the organelle is highly reliant on a number of proteins encoded by the nuclear genome, which localize to the mitochondria. With exceptions to a few anecdotal examples, the transcripts that have the potential to localize to the mitochondria have been poorly studied. We used a deep sequencing approach to identify transcripts encoded by the nuclear genome which localize to the mitoplast in a zebrafish model. We prioritized 292 candidate transcripts of nuclear origin that are potentially localized to the mitochondrial matrix. We experimentally demonstrated that the transcript encoding the nuclear encoded ribosomal protein 11 (Rpl11) localizes to the mitochondria. This study represents a comprehensive analysis of the mitochondrial localization of nuclear encoded transcripts. Our analysis has provided insights into a new layer of biomolecular pathways modulating mitochondrial-nuclear cross-talk. This provides a starting point towards understanding the role of nuclear encoded transcripts that localize to mitochondria and their influence on mitochondrial function.

    View details for DOI 10.1016/j.mito.2018.02.007

    View details for PubMedID 29486245

  • A genome-wide map of circular RNAs in adult zebrafish. Scientific reports Sharma, D., Sehgal, P., Mathew, S., Vellarikkal, S. K., Singh, A. R., Kapoor, S., Jayarajan, R., Scaria, V., Sivasubbu, S. 2019; 9 (1): 3432


    Circular RNAs (circRNAs) are transcript isoforms generated by back-splicing of exons and circularisation of the transcript. Recent genome-wide maps created for circular RNAs in humans and other model organisms have motivated us to explore the repertoire of circular RNAs in zebrafish, a popular model organism. We generated RNA-seq data for five major zebrafish tissues - Blood, Brain, Heart, Gills and Muscle. The repertoire RNA sequence reads left over after reference mapping to linear transcripts were used to identify unique back-spliced exons utilizing a split-mapping algorithm. Our analysis revealed 3,428 novel circRNAs in zebrafish. Further in-depth analysis suggested that majority of the circRNAs were derived from previously well-annotated protein-coding and long noncoding RNA gene loci. In addition, many of the circular RNAs showed extensive tissue specificity. We independently validated a subset of circRNAs using polymerase chain reaction (PCR) and divergent set of primers. Expression analysis using quantitative real time PCR recapitulate selected tissue specificity in the candidates studied. This study provides a comprehensive genome-wide map of circular RNAs in zebrafish tissues.

    View details for DOI 10.1038/s41598-019-39977-7

    View details for PubMedID 30837568

    View details for PubMedCentralID PMC6401160

  • Methods for Annotation and Validation of Circular RNAs from RNAseq Data. Methods in molecular biology (Clifton, N.J.) Sharma, D., Sehgal, P., Hariprakash, J., Sivasubbu, S., Scaria, V. 2019; 1912: 55-76


    Circular RNAs are an emerging class of transcript isoforms created by unique back splicing of exons to form a closed covalent circular structure. While initially considered as product of aberrant splicing, recent evidence suggests unique functions and conservation across evolution. While circular RNAs could be largely attributed to have little or no potential to encode for proteins, recent evidence points to at least a small subset of circular RNAs which encode for peptides. Circular RNAs are also increasingly shown to be biomarkers for a number of diseases including neurological disorders and cancer. The advent of deep sequencing has enabled large-scale identification of circular RNAs in human and other genomes. A number of computational approaches have come up in recent years to query circular RNAs on a genome-wide scale from RNA-seq data. In this chapter, we describe the application and methodology of identifying circular RNAs using three popular computational tools: FindCirc, Segemehl, and CIRI along with approaches for experimental validation of the unique splice junctions.

    View details for DOI 10.1007/978-1-4939-8982-9_3

    View details for PubMedID 30635890

  • Autologous NeoHep Derived from Chronic Hepatitis B Virus Patients' Blood Monocytes by Upregulation of c-MET Signaling. Stem cells translational medicine Bhattacharjee, J., Das, B., Sharma, D., Sahay, P., Jain, K., Mishra, A., Iyer, S., Nagpal, P., Scaria, V., Nagarajan, P., Khanduri, P., Mukhopadhyay, A., Upadhyay, P. 2017; 6 (1): 174-186


    In view of the escalating need for autologous cell-based therapy for treatment of liver diseases, a novel candidate has been explored in the present study. The monocytes isolated from hepatitis B surface antigen (HBsAg) nucleic acid test (NAT)-positive (HNP) blood were differentiated to hepatocyte-like cells (NeoHep) in vitro by a two-step culture procedure. The excess neutrophils present in HNP blood were removed before setting up the culture. In the first step of culture, apoptotic cells were depleted and genes involved in hypoxia were induced, which was followed by the upregulation of genes involved in the c-MET signaling pathway in the second step. The NeoHep were void of hepatitis B virus and showed expression of albumin, connexin 32, hepatocyte nuclear factor 4-α, and functions such as albumin secretion and cytochrome P450 enzyme-mediated detoxification of xenobiotics. The engraftment of NeoHep derived from HBsAg-NAT-positive blood monocytes in partially hepatectomized NOD.CB17-Prkdcscid /J mice liver and the subsequent secretion of human albumin and clotting factor VII activity in serum make NeoHep a promising candidate for cell-based therapy. Stem Cells Translational Medicine 2017;6:174-186.

    View details for DOI 10.5966/sctm.2015-0308

    View details for PubMedID 28170202

    View details for PubMedCentralID PMC5442753

  • Does the buck stop with the bugs?: an overview of microbial dysbiosis in rheumatoid arthritis. International journal of rheumatic diseases Sandhya, P., Danda, D., Sharma, D., Scaria, V. 2016; 19 (1): 8-20


    The human body is an environmental niche which is home to diverse co-habiting microbes collectively referred as the human microbiome. Recent years have seen the in-depth characterization of the human microbiome and associations with diseases. Linking of the composition or number of the human microbiota with diseases and traits date back to the original work of Elie Metchnikoff. Recent advances in genomic technologies have opened up finer details and dynamics of this new science with higher precision. Microbe-rheumatoid arthritis connection, largely related to the gut and oral microbiomes, has showed up as a result - apart from several other earlier, well-studied candidate autoimmune diseases. Although evidence favouring roles of specific microbial species, including Porphyromonas, Prevotella and Leptotricha, has become clearer, mechanistic insights still continue to be enigmatic. Manipulating the microbes by traditional dietary modifications, probiotics, and antibiotics and by currently employed disease-modifying agents seems to modulate the disease process and its progression. In the present review, we appraise the existing information as well as the gaps in knowledge in this challenging field. We also discuss the future directions for potential clinical applications, including prevention and management of rheumatoid arthritis using microbial modifications.

    View details for DOI 10.1111/1756-185X.12728

    View details for PubMedID 26385261