Nilotpal Sanyal completed his PhD in Statistics from University of Missouri, Columbia with dissertation titled 'Bayesian fMRI data analysis and Bayesian optimal design'. He worked in the Indian Statistical Institute, Kolkata for a year as visiting scientist following which he started his postdoctoral career in USA by joining Texas A&M University as a postdoctoral associate working on high-dimensional variable selection methods. Later, he joined the Radiology department at the University of California, San Diego as a postdoctoral scholar working on Genome wide association study (GWAS) analysis. Currently, he is a postdoctoral scholar in the Biomedical Informatics division under the Department of Medicine at the Stanford University. Presently, his research topics include gene by environment interaction analysis in case-control studies and screening strategies for lung cancer.
Master of Science, University Of Calcutta (2007)
Bachelor of Science, University Of Calcutta (2004)
Doctor of Philosophy, University of Missouri Columbia (2012)
Tobacco Smoking and Risk of Second Primary Lung Cancer.
Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer
Lung cancer survivors are at high risk of a second primary lung cancer (SPLC). However, SPLC risk factors have not been established and the impact of tobacco smoking remains controversial. We examined risk factors for SPLC across multiple epidemiologic cohorts and assessed the impact of smoking cessation on reducing SPLC risk.We analyzed data from 7,059 participants in the Multiethnic Cohort (MEC) diagnosed with an initial primary lung cancer (IPLC) between 1993 and 2017. Cause-specific proportional hazards models estimated SPLC risk. We conducted validation studies using the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO, N=3,423 IPLC cases) and European Prospective Investigation into Cancer and Nutrition (EPIC, N=4,731 IPLC cases) cohorts and pooled the SPLC risk estimates using random effects meta-analysis.Overall, 163 (2.3%) MEC cases developed a SPLC. Smoking pack-years (HR 1.18 per 10 pack-years; P<0.001) and smoking intensity (HR 1.30 per 10 cigarettes per day (CPD); P<0.001) were significantly associated with increased SPLC risk. Individuals who met the 2013 U.S. Preventive Services Task Force's (USPSTF) screening criteria at IPLC diagnosis also had an increased SPLC risk (HR 1.92; P<0.001). Validation studies with PLCO and EPIC showed consistent results. Meta-analysis yielded pooled HRs of 1.16 per 10 pack-years (Pmeta<0.001), 1.25 per 10 CPD (Pmeta<0.001), and 1.99 (Pmeta<0.001) for meeting the USPSTF criteria. In MEC, smoking cessation after IPLC diagnosis was associated with an 83% reduction in SPLC risk (HR 0.17; P<0.001).Tobacco smoking is a risk factor for SPLC. Smoking cessation after IPLC diagnosis may reduce the risk of SPLC. Additional strategies for SPLC surveillance and screening are warranted.
View details for DOI 10.1016/j.jtho.2021.02.024
View details for PubMedID 33722709
- A robust test for additive gene-environment interaction under the trend effect of genotype using an empirical Bayes-type shrinkage estimator American Journal of Epidemiology 2021
A Likelihood Ratio Test for Gene-Environment Interaction Based on the Trend Effect of Genotype Under an Additive Risk Model Using the Gene-Environment Independence Assumption.
American journal of epidemiology
Several statistical methods have been proposed for testing gene(G)-environment(E) interactions under additive risk models using genome-wide association study data. However, these approaches have strong assumptions on underlying genetic models such as dominant or recessive effects that are known to be less robust when the true genetic model is unknown. We aim to develop a robust trend test employing a likelihood ratio test for detecting G-E interaction under an additive risk model, while incorporating the G-E independence assumption to increase power. We used a constrained likelihood to impose two sets of constraints for (i) the linear trend effect of genotype and (ii) the additive joint effects of G and E. To incorporate the G-E independence assumption, a retrospective likelihood was used versus a standard prospective likelihood. Numerical investigation suggests that the proposed tests are more powerful than tests assuming dominant, recessive, or general models under various parameter settings and under both likelihoods. Incorporation of the independence assumption enhances efficiency by 2.5- fold. We applied the proposed methods to examine gene-smoking interaction for lung cancer and gene-APOE*4 interaction for Alzheimer's disease, which identified two interactions between APOE*4 and loci MS4A and BIN1 at genome-wide significance that were replicated using independent data.
View details for DOI 10.1093/aje/kwaa132
View details for PubMedID 32870973
Identification of genetic heterogeneity of Alzheimer's disease across age
NEUROBIOLOGY OF AGING
2019; 84: 243.e1–243.e9
The risk of APOE for Alzheimer's disease (AD) is modified by age. Beyond APOE, the polygenic architecture may also be heterogeneous across age. We aim to investigate age-related genetic heterogeneity of AD and identify genomic loci with differential effects across age. Stratified gene-based genome-wide association studies and polygenic variation analyses were performed in the younger (60-79 years, N = 14,895) and older (≥80 years, N = 6559) age-at-onset groups using Alzheimer's Disease Genetics Consortium data. We showed a moderate genetic correlation (rg = 0.64) between the two age groups, supporting genetic heterogeneity. Heritability explained by variants on chromosome 19 (harboring APOE) was significantly larger in younger than in older onset group (p < 0.05). APOE region, BIN1, OR2S2, MS4A4E, and PICALM were identified at the gene-based genome-wide significance (p < 2.73 × 10-6) with larger effects at younger age (except MS4A4E). For the novel gene OR2S2, we further performed leave-one-out analyses, which showed consistent effects across subsamples. Our results suggest using genetically more homogeneous individuals may help detect additional susceptible loci.
View details for DOI 10.1016/j.neurobiolaging.2019.02.022
View details for Web of Science ID 000501576800050
View details for PubMedID 30979435
View details for PubMedCentralID PMC6783343
- PREDICTING ANTIPSYCHOTIC RESPONSE COMBINING POLYGENIC RISK WITH PROTEIN-PROTEIN NETWORKS ELSEVIER. 2019: S27
GWASinlps: non-local prior based iterative SNP selection tool for genome-wide association studies
2019; 35 (1): 1–11
Multiple marker analysis of the genome-wide association study (GWAS) data has gained ample attention in recent years. However, because of the ultra high-dimensionality of GWAS data, such analysis is challenging. Frequently used penalized regression methods often lead to large number of false positives, whereas Bayesian methods are computationally very expensive. Motivated to ameliorate these issues simultaneously, we consider the novel approach of using non-local priors in an iterative variable selection framework.We develop a variable selection method, named, iterative non-local prior based selection for GWAS, or GWASinlps, that combines, in an iterative variable selection framework, the computational efficiency of the screen-and-select approach based on some association learning and the parsimonious uncertainty quantification provided by the use of non-local priors. The hallmark of our method is the introduction of 'structured screen-and-select' strategy, that considers hierarchical screening, which is not only based on response-predictor associations, but also based on response-response associations and concatenates variable selection within that hierarchy. Extensive simulation studies with single nucleotide polymorphisms having realistic linkage disequilibrium structures demonstrate the advantages of our computationally efficient method compared to several frequentist and Bayesian variable selection methods, in terms of true positive rate, false discovery rate, mean squared error and effect size estimation error. Further, we provide empirical power analysis useful for study design. Finally, a real GWAS data application was considered with human height as phenotype.An R-package for implementing the GWASinlps method is available at https://cran.r-project.org/web/packages/GWASinlps/index.html.Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/bty472
View details for Web of Science ID 000459313900001
View details for PubMedID 29931045
View details for PubMedCentralID PMC6298063
Revisiting Antipsychotic Drug Actions Through Gene Networks Associated With Schizophrenia
AMERICAN JOURNAL OF PSYCHIATRY
2018; 175 (7): 674–82
Antipsychotic drugs were incidentally discovered in the 1950s, but their mechanisms of action are still not understood. Better understanding of schizophrenia pathogenesis could shed light on actions of current drugs and reveal novel "druggable" pathways for unmet therapeutic needs. Recent genome-wide association studies offer unprecedented opportunities to characterize disease gene networks and uncover drug-disease relationships. Polygenic overlap between schizophrenia risk genes and antipsychotic drug targets has been demonstrated, but specific genes and pathways constituting this overlap are undetermined. Risk genes of polygenic disorders do not operate in isolation but in combination with other genes through protein-protein interactions among gene product.The protein interactome was used to map antipsychotic drug targets (N=88) to networks of schizophrenia risk genes (N=328).Schizophrenia risk genes were significantly localized in the interactome, forming a distinct disease module. Core genes of the module were enriched for genes involved in developmental biology and cognition, which may have a central role in schizophrenia etiology. Antipsychotic drug targets overlapped with the core disease module and comprised multiple pathways beyond dopamine. Some important risk genes like CHRN, PCDH, and HCN families were not connected to existing antipsychotics but may be suitable targets for novel drugs or drug repurposing opportunities to treat other aspects of schizophrenia, such as cognitive or negative symptoms.The network medicine approach provides a platform to collate information of disease genetics and drug-gene interactions to shift focus from development of antipsychotics to multitarget antischizophrenia drugs. This approach is transferable to other diseases.
View details for DOI 10.1176/appi.ajp.2017.17040410
View details for Web of Science ID 000437319200013
View details for PubMedID 29495895
View details for PubMedCentralID PMC6028303
Modeling prior information of common genetic variants improves gene discovery for neuroticism
HUMAN MOLECULAR GENETICS
2017; 26 (22): 4530–39
Neuroticism reflects emotional instability, and is related to various mental and physical health issues. However, the majority of genetic variants associated with neuroticism remain unclear. Inconsistent genetic variants identified by different genome-wide association studies (GWAS) may be attributable to low statistical power. We proposed a novel framework to improve the power for gene discovery by incorporating prior information of single nucleotide polymorphisms (SNPs) and combining two relevant existing tools, relative enrichment score (RES) and conditional false discovery rate (FDR). Here, SNP's conditional FDR was estimated given its RES based on SNP prior information including linkage disequilibrium (LD)-weighted genic annotation scores, total LD scores and heterozygosity. A known significant locus in chromosome 8p was excluded before estimating FDR due to long-range LD structure. Only one significant LD-independent SNP was detected by analyses of unconditional FDR and traditional GWAS in the discovery sample (N = 59 225), and notably four additional SNPs by conditional FDR. Three of the five SNPs, all identified by conditional FDR, were replicated (P < 0.05) in an independent sample (N = 170 911). These three SNPs are located in intronic regions of CADM2, LINGO2 and EP300 which have been reported to be associated with autism, Parkinson's disease and schizophrenia, respectively. Our approach using a combination of RES and conditional FDR improved power of traditional GWAS for gene discovery providing a useful framework for the analysis of GWAS summary statistics by utilizing SNP prior information, and helping to elucidate the links between neuroticism and complex diseases from a genetic perspective.
View details for DOI 10.1093/hmg/ddx340
View details for Web of Science ID 000414403900019
View details for PubMedID 28973307
View details for PubMedCentralID PMC5886256
- Leveraging genome characteristics to improve gene discovery for putamen subcortical brain structure Scientific Reports 2017; 7 (1)
Bayesian Wavelet Analysis Using Nonlocal Priors with an Application to fMRI Analysis
2017; 79: 361–388
View details for DOI 10.1007/s13571-016-0129-3
- Identification of Genetic Heterogeneity of Alzheimer's Disease Across Age Genetic Epidemiology 2017; 41 (7): 706-707
Genome-wide analyses for personality traits identify six genomic loci and show correlations with psychiatric disorders
2017; 49 (1): 152–56
Personality is influenced by genetic and environmental factors and associated with mental health. However, the underlying genetic determinants are largely unknown. We identified six genetic loci, including five novel loci, significantly associated with personality traits in a meta-analysis of genome-wide association studies (N = 123,132-260,861). Of these genome-wide significant loci, extraversion was associated with variants in WSCD2 and near PCDH15, and neuroticism with variants on chromosome 8p23.1 and in L3MBTL2. We performed a principal component analysis to extract major dimensions underlying genetic variations among five personality traits and six psychiatric disorders (N = 5,422-18,759). The first genetic dimension separated personality traits and psychiatric disorders, except that neuroticism and openness to experience were clustered with the disorders. High genetic correlations were found between extraversion and attention-deficit-hyperactivity disorder (ADHD) and between openness and schizophrenia and bipolar disorder. The second genetic dimension was closely aligned with extraversion-introversion and grouped neuroticism with internalizing psychopathology (e.g., depression or anxiety).
View details for DOI 10.1038/ng.3736
View details for Web of Science ID 000390976600022
View details for PubMedID 27918536
View details for PubMedCentralID PMC5278898
- Bayesian optimal sequential design for nonparametric regression via inhomogeneous evolutionary MCMC STATISTICAL METHODOLOGY 2014; 18: 131–41
Bayesian hierarchical multi-subject multiscale analysis of functional MRI data
2012; 63 (3): 1519–31
We develop a methodology for Bayesian hierarchical multi-subject multiscale analysis of functional Magnetic Resonance Imaging (fMRI) data. We begin by modeling the brain images temporally with a standard general linear model. After that, we transform the resulting estimated standardized regression coefficient maps through a discrete wavelet transformation to obtain a sparse representation in the wavelet space. Subsequently, we assign to the wavelet coefficients a prior that is a mixture of a point mass at zero and a Gaussian white noise. In this mixture prior for the wavelet coefficients, the mixture probabilities are related to the pattern of brain activity across different resolutions. To incorporate this information, we assume that the mixture probabilities for wavelet coefficients at the same location and level are common across subjects. Furthermore, we assign for the mixture probabilities a prior that depends on a few hyperparameters. We develop an empirical Bayes methodology to estimate the hyperparameters and, as these hyperparameters are shared by all subjects, we obtain precise estimated values. Then we carry out inference in the wavelet space and obtain smoothed images of the regression coefficients by applying the inverse wavelet transform to the posterior means of the wavelet coefficients. An application to computer simulated synthetic data has shown that, when compared to single-subject analysis, our multi-subject methodology performs better in terms of mean squared error. Finally, we illustrate the utility and flexibility of our multi-subject methodology with an application to an event-related fMRI dataset generated by Postle (2005) through a multi-subject fMRI study of working memory related brain activation.
View details for DOI 10.1016/j.neuroimage.2012.08.041
View details for Web of Science ID 000310379100053
View details for PubMedID 22951257