Basic Life Science Research Associate, Statistics
FreePSI: an alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome
NUCLEIC ACIDS RESEARCH
2018; 46 (2): e11
Alternative splicing plays an important role in many cellular processes of eukaryotic organisms. The exon-inclusion ratio, also known as percent spliced in, is often regarded as one of the most effective measures of alternative splicing events. The existing methods for estimating exon-inclusion ratios at the genome scale all require the existence of a reference transcriptome. In this paper, we propose an alignment-free method, FreePSI, to perform genome-wide estimation of exon-inclusion ratios from RNA-Seq data without relying on the guidance of a reference transcriptome. It uses a novel probabilistic generative model based on k-mer profiles to quantify the exon-inclusion ratios at the genome scale and an efficient expectation-maximization algorithm based on a divide-and-conquer strategy and ultrafast conjugate gradient projection descent method to solve the model. We compare FreePSI with the existing methods on simulated and real RNA-seq data in terms of both accuracy and efficiency and show that it is able to achieve very good performance even though a reference transcriptome is not provided. Our results suggest that FreePSI may have important applications in performing alternative splicing analysis for organisms that do not have quality reference transcriptomes. FreePSI is implemented in C++ and freely available to the public on GitHub.
View details for PubMedID 29136203
View details for PubMedCentralID PMC5778508
Simultaneous inference of phenotype-associated genes and relevant tissues from GWAS data via Bayesian integration of multiple tissue-specific gene networks
JOURNAL OF MOLECULAR CELL BIOLOGY
2017; 9 (6): 436–52
Although genome-wide association studies (GWAS) have successfully identified thousands of genomic loci associated with hundreds of complex traits in the past decade, the debate about such problems as missing heritability and weak interpretability has been appealing for effective computational methods to facilitate the advanced analysis of the vast volume of existing and anticipated genetic data. Towards this goal, gene-level integrative GWAS analysis with the assumption that genes associated with a phenotype tend to be enriched in biological gene sets or gene networks has recently attracted much attention, due to such advantages as straightforward interpretation, less multiple testing burdens, and robustness across studies. However, existing methods in this category usually exploit non-tissue-specific gene networks and thus lack the ability to utilize informative tissue-specific characteristics. To overcome this limitation, we proposed a Bayesian approach called SIGNET (Simultaneously Inference of GeNEs and Tissues) to integrate GWAS data and multiple tissue-specific gene networks for the simultaneous inference of phenotype-associated genes and relevant tissues. Through extensive simulation studies, we showed the effectiveness of our method in finding both associated genes and relevant tissues for a phenotype. In applications to real GWAS data of 14 complex phenotypes, we demonstrated the power of our method in both deciphering genetic basis and discovering biological insights of a phenotype. With this understanding, we expect to see SIGNET as a valuable tool for integrative GWAS analysis, thereby boosting the prevention, diagnosis, and treatment of human inherited diseases and eventually facilitating precision medicine.
View details for PubMedID 29300920