Bohdan Khomtchouk, Ph.D. is a data science postdoctoral fellow working in the field of computational epigenetics in the Gozani Lab at Stanford University in Stanford, CA USA. Bohdan's research involves understanding the data science behind aging-related diseases as well as creating artificial intelligence and machine learning software to organize the world's biological information at a massive scale -- working at the interdisciplinary interface of big data, integrative bioinformatics, multi-omics, natural language processing, and statistical learning. Bohdan serves as the chief bioinformatician and computational biologist for the Gozani Lab, working across a broad array of next-generation sequencing data, including RNA-seq, small RNA-seq, methyl-seq, ATAC-seq, mass spectrometry, and ChIP-seq. Bohdan is also the co-founder of Bioquilt (http://bioquilt.com/) and president of the Stanford R Group (http://www.stanfordr.com/).
Honors & Awards
NIH/NIA Stanford Training Program in Aging Research (T32 AG0047126), National Institute on Aging of the National Institutes of Health (2017-2018)
National Defense Science & Engineering (NDSEG) Graduate Fellowship (32 CFR 168a), Department of Defense, Army Research Office (Biosciences Division) (2014-2017)
UM Graduate Fellowship, University of Miami (2013-2014)
Doctor of Philosophy, University of Miami Miller School of Medicine, Computational Human Genetics & Genomics (2017)
Bachelor of Science, Benedictine University, Molecular Biology & Biochemistry (summa cum laude) (2013)
Bachelor of Science, Benedictine University, Physics (summa cum laude) (2013)
Bachelor of Science, Benedictine University, Mathematics (summa cum laude) (2013)
shinyheatmap: Ultra fast low memory heatmap web interface for big data genomics.
2017; 12 (5)
Transcriptomics, metabolomics, metagenomics, and other various next-generation sequencing (-omics) fields are known for their production of large datasets, especially across single-cell sequencing studies. Visualizing such big data has posed technical challenges in biology, both in terms of available computational resources as well as programming acumen. Since heatmaps are used to depict high-dimensional numerical data as a colored grid of cells, efficiency and speed have often proven to be critical considerations in the process of successfully converting data into graphics. For example, rendering interactive heatmaps from large input datasets (e.g., 100k+ rows) has been computationally infeasible on both desktop computers and web browsers. In addition to memory requirements, programming skills and knowledge have frequently been barriers-to-entry for creating highly customizable heatmaps.We propose shinyheatmap: an advanced user-friendly heatmap software suite capable of efficiently creating highly customizable static and interactive biological heatmaps in a web browser. shinyheatmap is a low memory footprint program, making it particularly well-suited for the interactive visualization of extremely large datasets that cannot typically be computed in-memory due to size restrictions. Also, shinyheatmap features a built-in high performance web plug-in, fastheatmap, for rapidly plotting interactive heatmaps of datasets as large as 105-107 rows within seconds, effectively shattering previous performance benchmarks of heatmap rendering speed.shinyheatmap is hosted online as a freely available web server with an intuitive graphical user interface: http://shinyheatmap.com. The methods are implemented in R, and are available as part of the shinyheatmap project at: https://github.com/Bohdan-Khomtchouk/shinyheatmap. Users can access fastheatmap directly from within the shinyheatmap web interface, and all source code has been made publicly available on Github: https://github.com/Bohdan-Khomtchouk/fastheatmap.
View details for DOI 10.1371/journal.pone.0176334
View details for PubMedID 28493881
View details for PubMedCentralID PMC5426587
How the strengths of Lisp-family languages facilitate building complex and flexible bioinformatics applications.
Briefings in bioinformatics
We present a rationale for expanding the presence of the Lisp family of programming languages in bioinformatics and computational biology research. Put simply, Lisp-family languages enable programmers to more quickly write programs that run faster than in other languages. Languages such as Common Lisp, Scheme and Clojure facilitate the creation of powerful and flexible software that is required for complex and rapidly evolving domains like biology. We will point out several important key features that distinguish languages of the Lisp family from other programming languages, and we will explain how these features can aid researchers in becoming more productive and creating better code. We will also show how these features make these languages ideal tools for artificial intelligence and machine learning applications. We will specifically stress the advantages of domain-specific languages (DSLs): languages that are specialized to a particular area, and thus not only facilitate easier research problem formulation, but also aid in the establishment of standards and best programming practices as applied to the specific research field at hand. DSLs are particularly easy to build in Common Lisp, the most comprehensive Lisp dialect, which is commonly referred to as the 'programmable programming language'. We are convinced that Lisp grants programmers unprecedented power to build increasingly sophisticated artificial intelligence systems that may ultimately transform machine learning and artificial intelligence research in bioinformatics and computational biology.
View details for DOI 10.1093/bib/bbw130
View details for PubMedID 28040748
Ischemic Preconditioning Confers Epigenetic Repression of Mtor and Induction of Autophagy Through G9a-Dependent H3K9 Dimethylation
JOURNAL OF THE AMERICAN HEART ASSOCIATION
2016; 5 (12)
Ischemic preconditioning (IPC) protects the heart from prolonged ischemic insult and reperfusion injury through a poorly understood mechanism. Post-translational modifications of histone residues can confer rapid and drastic switches in gene expression in response to various stimuli, including ischemia. The aim of this study was to investigate the effect of histone methylation in the response to cardiac ischemic preconditioning.We used cardiac biopsies from mice subjected to IPC to quantify global levels of 3 of the most well-studied histone methylation marks (H3K9me2, H3K27me3, and H3K4me3) with Western blot and found that H3K9me2 levels were significantly increased in the area at risk compared to remote myocardium. In order to assess which genes were affected by the increase in H3K9me2 levels, we performed ChIP-Seq and transcriptome profiling using microarray. Two hundred thirty-seven genes were both transcriptionally repressed and enriched in H3K9me2 in the area at risk of IPC mice. Of these, Mtor (Mechanistic target of rapamycin) was chosen for mechanistic studies. Knockdown of the major H3K9 methyltransferase G9a resulted in a significant decrease in H3K9me2 levels across Mtor, increased Mtor expression, as well as decreased autophagic activity in response to rapamycin and serum starvation.IPC confers an increase of H3K9me2 levels throughout the Mtor gene-a master regulator of cellular metabolism and a key player in the cardioprotective effect of IPC-leading to transcriptional repression via the methyltransferase G9a. The results of this study indicate that G9a has an important role in regulating cardiac autophagy and the cardioprotective effect of IPC.
View details for DOI 10.1161/JAHA.116.004076
View details for Web of Science ID 000390787700014
View details for PubMedID 28007739
View details for PubMedCentralID PMC5210409
MicroScope: ChIP-seq and RNA-seq software analysis suite for gene expression heatmaps
View details for DOI 10.1186/s12859-016-1260-x
View details for Web of Science ID 000383749300001
View details for PubMedID 27659774
View details for PubMedCentralID PMC5034416
Dependence-induced increase of alcohol self-administration and compulsive drinking mediated by the histone methyltransferase PRDM2.
Epigenetic processes have been implicated in the pathophysiology of alcohol dependence, but the specific molecular mechanisms mediating dependence-induced neuroadaptations remain largely unknown. Here, we found that a history of alcohol dependence persistently decreased the expression of Prdm2, a histone methyltransferase that monomethylates histone 3 at the lysine 9 residue (H3K9me1), in the rat dorsomedial prefrontal cortex (dmPFC). Downregulation of Prdm2 was associated with decreased H3K9me1, supporting that changes in Prdm2 mRNA levels affected its activity. Chromatin immunoprecipitation followed by massively parallel DNA sequencing showed that genes involved in synaptic communication are epigenetically regulated by H3K9me1 in dependent rats. In non-dependent rats, viral-vector-mediated knockdown of Prdm2 in the dmPFC resulted in expression changes similar to those observed following a history of alcohol dependence. Prdm2 knockdown resulted in increased alcohol self-administration, increased aversion-resistant alcohol intake and enhanced stress-induced relapse to alcohol seeking, a phenocopy of postdependent rats. Collectively, these results identify a novel epigenetic mechanism that contributes to the development of alcohol-seeking behavior following a history of dependence.Molecular Psychiatry advance online publication, 30 August 2016; doi:10.1038/mp.2016.131.
View details for DOI 10.1038/mp.2016.131
View details for PubMedID 27573876
- Survival Guide to Organic Chemistry: Bridging the Gap from General Chemistry CRC Press (Taylor & Francis). 2016
HeatmapGenerator: high performance RNAseq and microarray visualization software suite to examine differential gene expression levels using an R and C++ hybrid computational pipeline.
Source code for biology and medicine
2014; 9 (1): 30-?
The graphical visualization of gene expression data using heatmaps has become an integral component of modern-day medical research. Heatmaps are used extensively to plot quantitative differences in gene expression levels, such as those measured with RNAseq and microarray experiments, to provide qualitative large-scale views of the transcriptonomic landscape. Creating high-quality heatmaps is a computationally intensive task, often requiring considerable programming experience, particularly for customizing features to a specific dataset at hand.Software to create publication-quality heatmaps is developed with the R programming language, C++ programming language, and OpenGL application programming interface (API) to create industry-grade high performance graphics.We create a graphical user interface (GUI) software package called HeatmapGenerator for Windows OS and Mac OS X as an intuitive, user-friendly alternative to researchers with minimal prior coding experience to allow them to create publication-quality heatmaps using R graphics without sacrificing their desired level of customization. The simplicity of HeatmapGenerator is that it only requires the user to upload a preformatted input file and download the publicly available R software language, among a few other operating system-specific requirements. Advanced features such as color, text labels, scaling, legend construction, and even database storage can be easily customized with no prior programming knowledge.We provide an intuitive and user-friendly software package, HeatmapGenerator, to create high-quality, customizable heatmaps generated using the high-resolution color graphics capabilities of R. The software is available for Microsoft Windows and Apple Mac OS X. HeatmapGenerator is released under the GNU General Public License and publicly available at: http://sourceforge.net/projects/heatmapgenerator/. The Mac OS X direct download is available at: http://sourceforge.net/projects/heatmapgenerator/files/HeatmapGenerator_MAC_OSX.tar.gz/download. The Windows OS direct download is available at: http://sourceforge.net/projects/heatmapgenerator/files/HeatmapGenerator_WINDOWS.zip/download.
View details for DOI 10.1186/s13029-014-0030-2
View details for PubMedID 25550709
View details for PubMedCentralID PMC4279803