A 20-Gene Set Predictive of Progression to Severe Dengue.
2019; 26 (5): 1104
There is a need to identify biomarkers predictive of severe dengue. Single-cohort transcriptomics has not yielded generalizable results or parsimonious, predictive gene sets. We analyzed blood samples of dengue patients from seven gene expression datasets (446 samples, five countries) using an integrated multi-cohort analysis framework and identified a 20-gene set that predicts progression to severe dengue. We validated the predictive power of this 20-gene set in three retrospective dengue datasets (84 samples, three countries) and a prospective Colombia cohort (34 patients), with an area under the receiver operating characteristic curve of 0.89, 100% sensitivity, and 76% specificity. The 20-gene dengue severity scores declined during the diseasecourse, suggesting an infection-triggered host response. This 20-gene set is strongly associated with the progression to severe dengue and represents a predictive signature, generalizable across ages, host genetic factors, and virus strains, with potential implications for the development of a host response-based dengue prognostic assay.
View details for PubMedID 30699342
- A 20-Gene Set Predictive of Progression to Severe Dengue CELL REPORTS 2019; 26 (5): 1104-+
SNPs2ChIP: Latent Factors of ChIP-seq to infer functions of non-coding SNPs.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2019; 24: 184–95
Genetic variations of the human genome are linked to many disease phenotypes. While whole-genome sequencing and genome-wide association studies (GWAS) have uncovered a number of genotype-phenotype associations, their functional interpretation remains challenging given most single nucleotide polymorphisms (SNPs) fall into the non-coding region of the genome. Advances in chromatin immunoprecipitation sequencing (ChIP-seq) have made large-scale repositories of epigenetic data available, allowing investigation of coordinated mechanisms of epigenetic markers and transcriptional regulation and their influence on biological function. To address this, we propose SNPs2ChIP, a method to infer biological functions of non-coding variants through unsupervised statistical learning methods applied to publicly-available epigenetic datasets. We systematically characterized latent factors by applying singular value decomposition to ChIP-seq tracks of lymphoblastoid cell lines, and annotated the biological function of each latent factor using the genomic region enrichment analysis tool. Using these annotated latent factors as reference, we developed SNPs2ChIP, a pipeline that takes genomic region(s) as an input, identifies the relevant latent factors with quantitative scores, and returns them along with their inferred functions. As a case study, we focused on systemic lupus erythematosus and demonstrated our method's ability to infer relevant biological function. We systematically applied SNPs2ChIP on publicly available datasets, including known GWAS associations from the GWAS catalogue and ChIP-seq peaks from a previously published study. Our approach to leverage latent patterns across genome-wide epigenetic datasets to infer the biological function will advance understanding of the genetics of human diseases by accelerating the interpretation of non-coding genomes.
View details for PubMedID 30864321