Honors & Awards


  • Faculty Fellow at the Stanford Center at Peking University, SCPKU (September-October 2016)
  • Henri Benedictus Fellow, King Baudouin Foundation (June 2009)
  • Honorary Fellow, Belgian American Educational Foundation (BAEF) (June 2009)

Boards, Advisory Committees, Professional Organizations


  • Member, International Society for Computational Biology (ISCB) (2006 - Present)
  • Member, American Association for Cancer Research (AACR) (2010 - Present)

Professional Education


  • Certificate, Stanford Business School, Stanford Ignite (2012)
  • Ph.D, University of Leuven, Belgium, BIoinformatics (2008)
  • M.S., University of Leuven, Belgium, Artificial Intelligence (2004)
  • M.S., University College, Ghent, Belgium, Electrical Engineering/Computer Science (2003)

Current Research and Scholarly Interests


My lab focuses on biomedical data fusion: the development of machine learning methods for biomedical decision support using multi-scale biomedical data. Previously we pioneered data fusion work using Bayesian and kernel methods studying breast and ovarian cancer. Additionally, we developed computational algorithms for the identification of driver genes using multi-omics data. Furthermore, we are working on multi-scale biomedical data fusion methods, bridging the molecular using omics data, cellular using pathology data and tissue using medical imaging data.

Clinical Trials


  • Liquid Biopsy With PET/CT Versus PET/CT Alone in Diagnosis of Small Lung Nodules Recruiting

    The purpose of this study is to determine if a liquid biopsy, a method of detecting cancer from a blood draw, combined with a PET/CT scan, a type of radiological scan, is better at determining whether a lung nodule is cancerous when compared to a PET/CT scan alone. A PET/CT scan is already used for diagnosis of lung nodules, but its efficacy is uncertain in nodules 6-20 mm in size. Therefore, the PET/CT will be evaluated for its diagnostic ability in lesions this size alone and in combination with a liquid biopsy. Secondarily, a machine learning model will be created to see if the combination of the PET/CT imaging data and the liquid biopsy data can predict the presence of cancer.

    View full details

2023-24 Courses


Stanford Advisees


Graduate and Fellowship Programs


All Publications


  • Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models. Nature biomedical engineering Carrillo-Perez, F., Pizurica, M., Zheng, Y., Nandi, T. N., Madduri, R., Shen, J., Gevaert, O. 2024

    Abstract

    Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.

    View details for DOI 10.1038/s41551-024-01193-8

    View details for PubMedID 38514775

  • GeNNius: An ultrafast drug-target interaction inference method based on graph neural networks. Bioinformatics (Oxford, England) Veleiro, U., de la Fuente, J., Serrano, G., Pizurica, M., Casals, M., Pineda-Lucena, A., Vicent, S., Ochoa, I., Gevaert, O., Hernaez, M. 2023

    Abstract

    Drug-target interaction (DTI) prediction is a relevant but challenging task in the drug repurposing field. In-silico approaches have drawn particular attention as they can reduce associated costs and time commitment of traditional methodologies. Yet, current state-of-the-art methods present several limitations: existing DTI prediction approaches are computationally expensive, thereby hindering the ability to use large networks and exploit available datasets and, the generalization to unseen datasets of DTI prediction methods remains unexplored, which could potentially improve the development processes of DTI inferring approaches in terms of accuracy and robustness.In this work, we introduce GeNNius (Graph Embedding Neural Network Interaction Uncovering System), a Graph Neural Network (GNN)-based method that outperforms state-of-the-art models in terms of both accuracy and time efficiency across a variety of datasets. We also demonstrated its prediction power to uncover new interactions by evaluating not previously known DTIs for each dataset. We further assessed the generalization capability of GeNNius by training and testing it on different datasets, showing that this framework can potentially improve the DTI prediction task by training on large datasets and testing on smaller ones. Finally, we investigated qualitatively the embeddings generated by GeNNius, revealing that the GNN encoder maintains biological information after the graph convolutions while diffusing this information through nodes, eventually distinguishing protein families in the node embedding space.GeNNius code is available at https://github.com/ubioinformat/GeNNius.

    View details for DOI 10.1093/bioinformatics/btad774

    View details for PubMedID 38134424

  • Synthetic whole-slide image tile generation with gene expression profile-infused deep generative models. Cell reports methods Carrillo-Perez, F., Pizurica, M., Ozawa, M. G., Vogel, H., West, R. B., Kong, C. S., Herrera, L. J., Shen, J., Gevaert, O. 2023; 3 (8): 100534

    Abstract

    In this work, we propose an approach to generate whole-slide image (WSI) tiles by using deep generative models infused with matched gene expression profiles. First, we train a variational autoencoder (VAE) that learns a latent, lower-dimensional representation of multi-tissue gene expression profiles. Then, we use this representation to infuse generative adversarial networks (GANs) that generate lung and brain cortex tissue tiles, resulting in a new model that we call RNA-GAN. Tiles generated by RNA-GAN were preferred by expert pathologists compared with tiles generated using traditional GANs, and in addition, RNA-GAN needs fewer training epochs to generate high-quality tiles. Finally, RNA-GAN was able to generalize to gene expression profiles outside of the training set, showing imputation capabilities. A web-based quiz is available for users to play a game distinguishing real and synthetic tiles: https://rna-gan.stanford.edu/, and the code for RNA-GAN is available here: https://github.com/gevaertlab/RNA-GAN.

    View details for DOI 10.1016/j.crmeth.2023.100534

    View details for PubMedID 37671024

  • EpiMix is an integrative tool for epigenomic subtyping using DNA methylation. Cell reports methods Zheng, Y., Jun, J., Brennan, K., Gevaert, O. 2023; 3 (7): 100515

    Abstract

    DNA methylation (DNAme) is a major epigenetic factor influencing gene expression with alterations leading to cancer and immunological and cardiovascular diseases. Recent technological advances have enabled genome-wide profiling of DNAme in large human cohorts. There is a need for analytical methods that can more sensitively detect differential methylation profiles present in subsets of individuals from these heterogeneous, population-level datasets. We developed an end-to-end analytical framework named "EpiMix" for population-level analysis of DNAme and gene expression. Compared with existing methods, EpiMix showed higher sensitivity in detecting abnormal DNAme that was present in only small patient subsets. We extended the model-based analyses of EpiMix to cis-regulatory elements within protein-coding genes, distal enhancers, and genes encoding microRNAs and long non-coding RNAs (lncRNAs). Using cell-type-specific data from two separate studies, we discover epigenetic mechanisms underlying childhood food allergy and survival-associated, methylation-driven ncRNAs in non-small cell lung cancer.

    View details for DOI 10.1016/j.crmeth.2023.100515

    View details for PubMedID 37533639

    View details for PubMedCentralID PMC10391348

  • Spatial cellular architecture predicts prognosis in glioblastoma. Nature communications Zheng, Y., Carrillo-Perez, F., Pizurica, M., Heiland, D. H., Gevaert, O. 2023; 14 (1): 4122

    Abstract

    Intra-tumoral heterogeneity and cell-state plasticity are key drivers for the therapeutic resistance of glioblastoma. Here, we investigate the association between spatial cellular organization and glioblastoma prognosis. Leveraging single-cell RNA-seq and spatial transcriptomics data, we develop a deep learning model to predict transcriptional subtypes of glioblastoma cells from histology images. Employing this model, we phenotypically analyze 40 million tissue spots from 410 patients and identify consistent associations between tumor architecture and prognosis across two independent cohorts. Patients with poor prognosis exhibit higher proportions of tumor cells expressing a hypoxia-induced transcriptional program. Furthermore, a clustering pattern of astrocyte-like tumor cells is associated with worse prognosis, while dispersion and connection of the astrocytes with other transcriptional subtypes correlate with decreased risk. To validate these results, we develop a separate deep learning model that utilizes histology images to predict prognosis. Applying this model to spatial transcriptomics data reveal survival-associated regional gene expression programs. Overall, our study presents a scalable approach to unravel the transcriptional heterogeneity of glioblastoma and establishes a critical connection between spatial cellular architecture and clinical outcomes.

    View details for DOI 10.1038/s41467-023-39933-0

    View details for PubMedID 37433817

    View details for PubMedCentralID PMC10336135

  • SCOPE: predicting future diagnoses in office visits using electronic health records. Scientific reports Mukherjee, P., Humbert-Droz, M., Chen, J. H., Gevaert, O. 2023; 13 (1): 11005

    Abstract

    We propose an interpretable and scalable model to predict likely diagnoses at an encounter based on past diagnoses and lab results. This model is intended to aid physicians in their interaction with the electronic health records (EHR). To accomplish this, we retrospectively collected and de-identified EHR data of 2,701,522 patients at Stanford Healthcare over a time period from January 2008 to December 2016. A population-based sample of patients comprising 524,198 individuals (44% M, 56% F) with multiple encounters with at least one frequently occurring diagnosis codes were chosen. A calibrated model was developed to predict ICD-10 diagnosis codes at an encounter based on the past diagnoses and lab results, using a binary relevance based multi-label modeling strategy. Logistic regression and random forests were tested as the base classifier, and several time windows were tested for aggregating the past diagnoses and labs. This modeling approach was compared to a recurrent neural network based deep learning method. The best model used random forest as the base classifier and integrated demographic features, diagnosis codes, and lab results. The best model was calibrated and its performance was comparable or better than existing methods in terms of various metrics, including a median AUROC of 0.904 (IQR [0.838, 0.954]) over 583 diseases. When predicting the first occurrence of a disease label for a patient, the median AUROC with the best model was 0.796 (IQR [0.737, 0.868]). Our modeling approach performed comparably as the tested deep learning method, outperforming it in terms of AUROC (p<0.001) but underperforming in terms of AUPRC (p<0.001). Interpreting the model showed that the model uses meaningful features and highlights many interesting associations among diagnoses and lab results. We conclude that the multi-label model performs comparably with RNN based deep learning model while offering simplicity and potentially superior interpretability. While the model was trained and validated on data obtained from a single institution, its simplicity, interpretability and performance makes it a promising candidate for deployment.

    View details for DOI 10.1038/s41598-023-38257-9

    View details for PubMedID 37419945

  • Machine learning with multimodal data for COVID-19. Heliyon Chen, W., Sá, R. C., Bai, Y., Napel, S., Gevaert, O., Lauderdale, D. S., Giger, M. L. 2023; 9 (7): e17934

    Abstract

    In response to the unprecedented global healthcare crisis of the COVID-19 pandemic, the scientific community has joined forces to tackle the challenges and prepare for future pandemics. Multiple modalities of data have been investigated to understand the nature of COVID-19. In this paper, MIDRC investigators present an overview of the state-of-the-art development of multimodal machine learning for COVID-19 and model assessment considerations for future studies. We begin with a discussion of the lessons learned from radiogenomic studies for cancer diagnosis. We then summarize the multi-modality COVID-19 data investigated in the literature including symptoms and other clinical data, laboratory tests, imaging, pathology, physiology, and other omics data. Publicly available multimodal COVID-19 data provided by MIDRC and other sources are summarized. After an overview of machine learning developments using multimodal data for COVID-19, we present our perspectives on the future development of multimodal machine learning models for COVID-19.

    View details for DOI 10.1016/j.heliyon.2023.e17934

    View details for PubMedID 37483733

    View details for PubMedCentralID PMC10362086

  • Whole slide imaging-based prediction of TP53 mutations identifies an aggressive disease phenotype in prostate cancer. Cancer research Pizurica, M., Larmuseau, M., Van der Eecken, K., de Schaetzen van Brienen, L., Carrillo-Perez, F., Isphording, S., Lumen, N., Van Dorpe, J., Ost, P., Verbeke, S., Gevaert, O., Marchal, K. 2023

    Abstract

    In prostate cancer, there is an urgent need for objective prognostic biomarkers that identify the metastatic potential of a tumor at an early stage. While recent analyses indicated TP53 mutations as candidate biomarkers, molecular profiling in a clinical setting is complicated by tumor heterogeneity. Deep learning models that predict the spatial presence of TP53 mutations in whole slide images (WSIs) offer the potential to mitigate this issue. To assess the potential of WSIs as proxies for spatially resolved profiling and as biomarkers for aggressive disease, we developed TiDo, a deep learning model that achieves state-of-the-art performance in predicting TP53 mutations from WSIs of primary prostate tumors. In an independent multi-focal cohort, the model showed successful generalization at both the patient and lesion level. Analysis of model predictions revealed that false positive (FP) predictions could at least partially be explained by TP53 deletions, suggesting that some FP carry an alteration that leads to the same histological phenotype as TP53 mutations. Comparative expression and histological cell type analyses identified a TP53-like cellular phenotype triggered by expression of pathways affecting stromal composition. Together, these findings indicate that WSI-based models might not be able to perfectly predict the spatial presence of individual TP53 mutations but they have the potential to elucidate the prognosis of a tumor by depicting a downstream phenotype associated with aggressive disease biomarkers.

    View details for DOI 10.1158/0008-5472.CAN-22-3113

    View details for PubMedID 37352385

  • Augmenting digital twins with federated learning in medicine. The Lancet. Digital health Nagaraj, D., Khandelwal, P., Steyaert, S., Gevaert, O. 2023; 5 (5): e251-e253

    View details for DOI 10.1016/S2589-7500(23)00044-4

    View details for PubMedID 37100540

  • Augmenting digital twins with federated learning in medicine LANCET DIGITAL HEALTH Nagaraj, D., Khandelwal, P., Steyaert, S., Gevaert, O. 2023; 5 (5): E251-E253
  • Multimodal data fusion for cancer biomarker discovery with deep learning NATURE MACHINE INTELLIGENCE Steyaert, S., Pizurica, M., Nagaraj, D., Khandelwal, P., Hernandez-Boussard, T., Gentles, A. J., Gevaert, O. 2023
  • Multimodal data fusion for cancer biomarker discovery with deep learning. Nature machine intelligence Steyaert, S., Pizurica, M., Nagaraj, D., Khandelwal, P., Hernandez-Boussard, T., Gentles, A. J., Gevaert, O. 2023; 5 (4): 351-362

    Abstract

    Technological advances now make it possible to study a patient from multiple angles with high-dimensional, high-throughput multi-scale biomedical data. In oncology, massive amounts of data are being generated ranging from molecular, histopathology, radiology to clinical records. The introduction of deep learning has significantly advanced the analysis of biomedical data. However, most approaches focus on single data modalities leading to slow progress in methods to integrate complementary data types. Development of effective multimodal fusion approaches is becoming increasingly important as a single modality might not be consistent and sufficient to capture the heterogeneity of complex diseases to tailor medical care and improve personalised medicine. Many initiatives now focus on integrating these disparate modalities to unravel the biological processes involved in multifactorial diseases such as cancer. However, many obstacles remain, including lack of usable data as well as methods for clinical validation and interpretation. Here, we cover these current challenges and reflect on opportunities through deep learning to tackle data sparsity and scarcity, multimodal interpretability, and standardisation of datasets.

    View details for DOI 10.1038/s42256-023-00633-5

    View details for PubMedID 37693852

    View details for PubMedCentralID PMC10484010

  • Multimodal deep learning to predict prognosis in adult and pediatric brain tumors. Communications medicine Steyaert, S., Qiu, Y. L., Zheng, Y., Mukherjee, P., Vogel, H., Gevaert, O. 2023; 3 (1): 44

    Abstract

    The introduction of deep learning in both imaging and genomics has significantly advanced the analysis of biomedical data. For complex diseases such as cancer, different data modalities may reveal different disease characteristics, and the integration of imaging with genomic data has the potential to unravel additional information than when using these data sources in isolation. Here, we propose a DL framework that combines these two modalities with the aim to predict brain tumor prognosis.Using two separate glioma cohorts of 783 adults and 305 pediatric patients we developed a DL framework that can fuse histopathology images with gene expression profiles. Three strategies for data fusion were implemented and compared: early, late, and joint fusion. Additional validation of the adult glioma models was done on an independent cohort of 97 adult patients.Here we show that the developed multimodal data models achieve better prediction results compared to the single data models, but also lead to the identification of more relevant biological pathways. When testing our adult models on a third brain tumor dataset, we show our multimodal framework is able to generalize and performs better on new data from different cohorts. Leveraging the concept of transfer learning, we demonstrate how our pediatric multimodal models can be used to predict prognosis for two more rare (less available samples) pediatric brain tumors.Our study illustrates that a multimodal data fusion approach can be successfully implemented and customized to model clinical outcome of adult and pediatric brain tumors.

    View details for DOI 10.1038/s43856-023-00276-y

    View details for PubMedID 36991216

    View details for PubMedCentralID 5563115

  • A deep-learning algorithm to classify skin lesions from mpox virus infection. Nature medicine Thieme, A. H., Zheng, Y., Machiraju, G., Sadee, C., Mittermaier, M., Gertler, M., Salinas, J. L., Srinivasan, K., Gyawali, P., Carrillo-Perez, F., Capodici, A., Uhlig, M., Habenicht, D., Loser, A., Kohler, M., Schuessler, M., Kaul, D., Gollrad, J., Ma, J., Lippert, C., Billick, K., Bogoch, I., Hernandez-Boussard, T., Geldsetzer, P., Gevaert, O. 2023

    Abstract

    Undetected infection and delayed isolation of infected individuals are key factors driving the monkeypox virus (now termed mpox virus or MPXV) outbreak. To enable earlier detection of MPXV infection, we developed an image-based deep convolutional neural network (named MPXV-CNN) for the identification of the characteristic skin lesions caused by MPXV. We assembled a dataset of 139,198 skin lesion images, split into training/validation and testing cohorts, comprising non-MPXV images (n=138,522) from eight dermatological repositories and MPXV images (n=676) from the scientific literature, news articles, social media and a prospective cohort of the Stanford University Medical Center (n=63 images from 12 patients, all male). In the validation and testing cohorts, the sensitivity of the MPXV-CNN was 0.83 and 0.91, the specificity was 0.965 and 0.898 and the area under the curve was 0.967 and 0.966, respectively. In the prospective cohort, the sensitivity was 0.89. The classification performance of the MPXV-CNN was robust across various skin tones and body regions. To facilitate the usage of the algorithm, we developed a web-based app by which the MPXV-CNN can be accessed for patient guidance. The capability of the MPXV-CNN for identifying MPXV lesions has the potential to aid in MPXV outbreak mitigation.

    View details for DOI 10.1038/s41591-023-02225-7

    View details for PubMedID 36864252

  • Identifying key multifunctional components shared by critical cancer and normal liver pathways via SparseGMM CELL REPORTS METHODS Bakr, S., Brennan, K., Mukherjee, P., Argemi, J., Hernaez, M., Gevaert, O. 2023; 3 (1): 100392

    Abstract

    Despite the abundance of multimodal data, suitable statistical models that can improve our understanding of diseases with genetic underpinnings are challenging to develop. Here, we present SparseGMM, a statistical approach for gene regulatory network discovery. SparseGMM uses latent variable modeling with sparsity constraints to learn Gaussian mixtures from multiomic data. By combining coexpression patterns with a Bayesian framework, SparseGMM quantitatively measures confidence in regulators and uncertainty in target gene assignment by computing gene entropy. We apply SparseGMM to liver cancer and normal liver tissue data and evaluate discovered gene modules in an independent single-cell RNA sequencing (scRNA-seq) dataset. SparseGMM identifies PROCR as a regulator of angiogenesis and PDCD1LG2 and HNF4A as regulators of immune response and blood coagulation in cancer. Furthermore, we show that more genes have significantly higher entropy in cancer compared with normal liver. Among high-entropy genes are key multifunctional components shared by critical pathways, including p53 and estrogen signaling.

    View details for DOI 10.1016/j.crmeth.2022.100392

    View details for Web of Science ID 000925842300001

    View details for PubMedID 36814838

    View details for PubMedCentralID PMC9939431

  • Imaging genomics: data fusion in uncovering disease heritability. Trends in molecular medicine Hartmann, K., Sadée, C. Y., Satwah, I., Carrillo-Perez, F., Gevaert, O. 2022

    Abstract

    Sequencing of the human genome in the early 2000s enabled probing of the genetic basis of disease on a scale previously unimaginable. Now, two decades later, after interrogating millions of markers in thousands of individuals, a significant portion of disease heritability still remains hidden. Recent efforts to unravel this 'missing heritability' have focused on garnering new insight from merging different data types, including medical imaging. Imaging offers promising intermediate phenotypes to bridge the gap between genetic variation and disease pathology. In this review we outline this fusion and provide examples of imaging genomics in a range of diseases, from oncology to cardiovascular and neurodegenerative disease. Finally, we discuss how ongoing revolutions in data science and sharing are primed to advance the field.

    View details for DOI 10.1016/j.molmed.2022.11.002

    View details for PubMedID 36470817

  • Accurate detection of benign and malignant renal tumor subtypes with MethylBoostER: An epigenetic marker-driven learning framework. Science advances Rossi, S. H., Newsham, I., Pita, S., Brennan, K., Park, G., Smith, C. G., Lach, R. P., Mitchell, T., Huang, J., Babbage, A., Warren, A. Y., Leppert, J. T., Stewart, G. D., Gevaert, O., Massie, C. E., Samarajiwa, S. A. 2022; 8 (39): eabn9828

    Abstract

    Current gold standard diagnostic strategies are unable to accurately differentiate malignant from benign small renal masses preoperatively; consequently, 20% of patients undergo unnecessary surgery. Devising a more confident presurgical diagnosis is key to improving treatment decision-making. We therefore developed MethylBoostER, a machine learning model leveraging DNA methylation data from 1228 tissue samples, to classify pathological subtypes of renal tumors (benign oncocytoma, clear cell, papillary, and chromophobe RCC) and normal kidney. The prediction accuracy in the testing set was 0.960, with class-wise ROC AUCs >0.988 for all classes. External validation was performed on >500 samples from four independent datasets, achieving AUCs >0.89 for all classes and average accuracies of 0.824, 0.703, 0.875, and 0.894 for the four datasets. Furthermore, consistent classification of multiregion samples (N = 185) from the same patient demonstrates that methylation heterogeneity does not limit model applicability. Following further clinical studies, MethylBoostER could facilitate a more confident presurgical diagnosis to guide treatment decision-making in the future.

    View details for DOI 10.1126/sciadv.abn9828

    View details for PubMedID 36170366

  • Disparities in dermatology AI performance on a diverse, curated clinical image set. Science advances Daneshjou, R., Vodrahalli, K., Novoa, R. A., Jenkins, M., Liang, W., Rotemberg, V., Ko, J., Swetter, S. M., Bailey, E. E., Gevaert, O., Mukherjee, P., Phung, M., Yekrang, K., Fong, B., Sahasrabudhe, R., Allerup, J. A., Okata-Karigane, U., Zou, J., Chiou, A. S. 2022; 8 (32): eabq6147

    Abstract

    An estimated 3 billion people lack access to dermatological care globally. Artificial intelligence (AI) may aid in triaging skin diseases and identifying malignancies. However, most AI models have not been assessed on images of diverse skin tones or uncommon diseases. Thus, we created the Diverse Dermatology Images (DDI) dataset-the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones. We show that state-of-the-art dermatology AI models exhibit substantial limitations on the DDI dataset, particularly on dark skin tones and uncommon diseases. We find that dermatologists, who often label AI datasets, also perform worse on images of dark skin tones and uncommon diseases. Fine-tuning AI models on the DDI images closes the performance gap between light and dark skin tones. These findings identify important weaknesses and biases in dermatology AI that should be addressed for reliable application to diverse patients and diseases.

    View details for DOI 10.1126/sciadv.abq6147

    View details for PubMedID 35960806

  • A web-based app to provide personalized recommendations for COVID-19. Nature medicine Thieme, A. H., Gertler, M., Mittermaier, M., Groschel, M. I., Chen, J. H., Piening, B., Benzler, J., Habenicht, D., Budach, V., Gevaert, O. 2022

    View details for DOI 10.1038/s41591-022-01797-0

    View details for PubMedID 35534571

  • Strategies to Address the Lack of Labeled Data for Supervised Machine Learning Training With Electronic Health Records: Case Study for the Extraction of Symptoms From Clinical Notes. JMIR medical informatics Humbert-Droz, M., Mukherjee, P., Gevaert, O. 2022; 10 (3): e32903

    Abstract

    BACKGROUND: Automated extraction of symptoms from clinical notes is a challenging task owing to the multidimensional nature of symptom description. The availability of labeled training data is extremely limited owing to the nature of the data containing protected health information. Natural language processing and machine learning to process clinical text for such a task have great potential. However, supervised machine learning requires a great amount of labeled data to train a model, which is at the origin of the main bottleneck in model development.OBJECTIVE: The aim of this study is to address the lack of labeled data by proposing 2 alternatives to manual labeling for the generation of training labels for supervised machine learning with English clinical text. We aim to demonstrate that using lower-quality labels for training leads to good classification results.METHODS: We addressed the lack of labels with 2 strategies. The first approach took advantage of the structured part of electronic health records and used diagnosis codes (International Classification of Disease-10th revision) to derive training labels. The second approach used weak supervision and data programming principles to derive training labels. We propose to apply the developed framework to the extraction of symptom information from outpatient visit progress notes of patients with cardiovascular diseases.RESULTS: We used >500,000 notes for training our classification model with International Classification of Disease-10th revision codes as labels and >800,000 notes for training using labels derived from weak supervision. We show that the dependence between prevalence and recall becomes flat provided a sufficiently large training set is used (>500,000 documents). We further demonstrate that using weak labels for training rather than the electronic health record codes derived from the patient encounter leads to an overall improved recall score (10% improvement, on average). Finally, the external validation of our models shows excellent predictive performance and transferability, with an overall increase of 20% in the recall score.CONCLUSIONS: This work demonstrates the power of using a weak labeling pipeline to annotate and extract symptom mentions in clinical text, with the prospects to facilitate symptom information integration for a downstream clinical task such as clinical decision support.

    View details for DOI 10.2196/32903

    View details for PubMedID 35285805

  • Exploring approaches for predictive cancer patient digital twins: Opportunities for collaboration and innovation. Frontiers in digital health Stahlberg, E. A., Abdel-Rahman, M., Aguilar, B., Asadpoure, A., Beckman, R. A., Borkon, L. L., Bryan, J. N., Cebulla, C. M., Chang, Y. H., Chatterjee, A., Deng, J., Dolatshahi, S., Gevaert, O., Greenspan, E. J., Hao, W., Hernandez-Boussard, T., Jackson, P. R., Kuijjer, M., Lee, A., Macklin, P., Madhavan, S., McCoy, M. D., Mohammad Mirzaei, N., Razzaghi, T., Rocha, H. L., Shahriyari, L., Shmulevich, I., Stover, D. G., Sun, Y., Syeda-Mahmood, T., Wang, J., Wang, Q., Zervantonakis, I. 2022; 4: 1007784

    Abstract

    We are rapidly approaching a future in which cancer patient digital twins will reach their potential to predict cancer prevention, diagnosis, and treatment in individual patients. This will be realized based on advances in high performance computing, computational modeling, and an expanding repertoire of observational data across multiple scales and modalities. In 2020, the US National Cancer Institute, and the US Department of Energy, through a trans-disciplinary research community at the intersection of advanced computing and cancer research, initiated team science collaborative projects to explore the development and implementation of predictive Cancer Patient Digital Twins. Several diverse pilot projects were launched to provide key insights into important features of this emerging landscape and to determine the requirements for the development and adoption of cancer patient digital twins. Projects included exploring approaches to using a large cohort of digital twins to perform deep phenotyping and plan treatments at the individual level, prototyping self-learning digital twin platforms, using adaptive digital twin approaches to monitor treatment response and resistance, developing methods to integrate and fuse data and observations across multiple scales, and personalizing treatment based on cancer type. Collectively these efforts have yielded increased insights into the opportunities and challenges facing cancer patient digital twin approaches and helped define a path forward. Given the rapidly growing interest in patient digital twins, this manuscript provides a valuable early progress report of several CPDT pilot projects commenced in common, their overall aims, early progress, lessons learned and future directions that will increasingly involve the broader research community.

    View details for DOI 10.3389/fdgth.2022.1007784

    View details for PubMedID 36274654

  • AI-based analysis of CT images for rapid triage of COVID-19 patients. NPJ digital medicine Xu, Q., Zhan, X., Zhou, Z., Li, Y., Xie, P., Zhang, S., Li, X., Yu, Y., Zhou, C., Zhang, L., Gevaert, O., Lu, G. 2021; 4 (1): 75

    Abstract

    The COVID-19 pandemic overwhelms the medical resources in the stressed intensive care unit (ICU) capacity and the shortage of mechanical ventilation (MV). We performed CT-based analysis combined with electronic health records and clinical laboratory results on Cohort 1 (n=1662 from 17 hospitals) with prognostic estimation for the rapid stratification of PCR confirmed COVID-19 patients. These models, validated on Cohort 2 (n=700) and Cohort 3 (n=662) constructed from nine external hospitals, achieved satisfying performance for predicting ICU, MV, and death of COVID-19 patients (AUROC 0.916, 0.919, and 0.853), even on events happened two days later after admission (AUROC 0.919, 0.943, and 0.856). Both clinical and image features showed complementary roles in prediction and provided accurate estimates to the time of progression (p<0.001). Our findings are valuable for optimizing the use of medical resources in the COVID-19 pandemic. The models are available here: https://github.com/terryli710/COVID_19_Rapid_Triage_Risk_Predictor .

    View details for DOI 10.1038/s41746-021-00446-z

    View details for PubMedID 33888856

  • CT-based Radiomic Signatures for Predicting Histopathologic Features in Head and Neck Squamous Cell Carcinoma. Radiology. Imaging cancer Mukherjee, P., Cintra, M., Huang, C., Zhou, M., Zhu, S., Colevas, A. D., Fischbein, N., Gevaert, O. 2020; 2 (3): e190039

    Abstract

    Purpose: To determine the performance of CT-based radiomic features for noninvasive prediction of histopathologic features of tumor grade, extracapsular spread, perineural invasion, lymphovascular invasion, and human papillomavirus status in head and neck squamous cell carcinoma (HNSCC).Materials and Methods: In this retrospective study, which was approved by the local institutional ethics committee, CT images and clinical data from patients with pathologically proven HNSCC from The Cancer Genome Atlas (n = 113) and an institutional test cohort (n = 71) were analyzed. A machine learning model was trained with 2131 extracted radiomic features to predict tumor histopathologic characteristics. In the model, principal component analysis was used for dimensionality reduction, and regularized regression was used for classification.Results: The trained radiomic model demonstrated moderate capability of predicting HNSCC features. In the training cohort and the test cohort, the model achieved a mean area under the receiver operating characteristic curve (AUC) of 0.75 (95% confidence interval [CI]: 0.68, 0.81) and 0.66 (95% CI: 0.45, 0.84), respectively, for tumor grade; a mean AUC of 0.64 (95% CI: 0.55, 0.62) and 0.70 (95% CI: 0.47, 0.89), respectively, for perineural invasion; a mean AUC of 0.69 (95% CI: 0.56, 0.81) and 0.65 (95% CI: 0.38, 0.87), respectively, for lymphovascular invasion; a mean AUC of 0.77 (95% CI: 0.65, 0.88) and 0.67 (95% CI: 0.15, 0.80), respectively, for extracapsular spread; and a mean AUC of 0.71 (95% CI: 0.29, 1.0) and 0.80 (95% CI: 0.65, 0.92), respectively, for human papillomavirus status.Conclusion: Radiomic CT models have the potential to predict characteristics typically identified on pathologic assessment of HNSCC.Supplemental material is available for this article.© RSNA, 2020.

    View details for DOI 10.1148/rycan.2020190039

    View details for PubMedID 32550599

  • A Shallow Convolutional Neural Network Predicts Prognosis of Lung Cancer Patients in Multi-Institutional CT-Image Data. Nature machine intelligence Mukherjee, P., Zhou, M., Lee, E., Schicht, A., Balagurunathan, Y., Napel, S., Gillies, R., Wong, S., Thieme, A., Leung, A., Gevaert, O. 2020; 2 (5): 274-282

    Abstract

    Lung cancer is the most common fatal malignancy in adults worldwide, and non-small cell lung cancer (NSCLC) accounts for 85% of lung cancer diagnoses. Computed tomography (CT) is routinely used in clinical practice to determine lung cancer treatment and assess prognosis. Here, we developed LungNet, a shallow convolutional neural network for predicting outcomes of NSCLC patients. We trained and evaluated LungNet on four independent cohorts of NSCLC patients from four medical centers: Stanford Hospital (n = 129), H. Lee Moffitt Cancer Center and Research Institute (n = 185), MAASTRO Clinic (n = 311) and Charité - Universitätsmedizin (n=84). We show that outcomes from LungNet are predictive of overall survival in all four independent survival cohorts as measured by concordance indices of 0.62, 0.62, 0.62 and 0.58 on cohorts 1, 2, 3, and 4, respectively. Further, the survival model can be used, via transfer learning, for classifying benign vs malignant nodules on the Lung Image Database Consortium (n = 1010), with improved performance (AUC=0.85) versus training from scratch (AUC=0.82). LungNet can be used as a noninvasive predictor for prognosis in NSCLC patients and can facilitate interpretation of CT images for lung cancer stratification and prognostication.

    View details for DOI 10.1038/s42256-020-0173-6

    View details for PubMedID 33791593

    View details for PubMedCentralID PMC8008967

  • A meta-learning approach for genomic survival analysis. Nature communications Qiu, Y. L., Zheng, H. n., Devos, A. n., Selby, H. n., Gevaert, O. n. 2020; 11 (1): 6350

    Abstract

    RNA sequencing has emerged as a promising approach in cancer prognosis as sequencing data becomes more easily and affordably accessible. However, it remains challenging to build good predictive models especially when the sample size is limited and the number of features is high, which is a common situation in biomedical settings. To address these limitations, we propose a meta-learning framework based on neural networks for survival analysis and evaluate it in a genomic cancer research setting. We demonstrate that, compared to regular transfer-learning, meta-learning is a significantly more effective paradigm to leverage high-dimensional data that is relevant but not directly related to the problem of interest. Specifically, meta-learning explicitly constructs a model, from abundant data of relevant tasks, to learn a new task with few samples effectively. For the application of predicting cancer survival outcome, we also show that the meta-learning framework with a few samples is able to achieve competitive performance with learning from scratch with a significantly larger number of samples. Finally, we demonstrate that the meta-learning model implicitly prioritizes genes based on their contribution to survival prediction and allows us to identify important pathways in cancer.

    View details for DOI 10.1038/s41467-020-20167-3

    View details for PubMedID 33311484

  • Genomic data imputation with variational auto-encoders. GigaScience Qiu, Y. L., Zheng, H. n., Gevaert, O. n. 2020; 9 (8)

    Abstract

    As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle certain cases not missing at random.In this work, we use a deep-learning framework based on the variational auto-encoder (VAE) for genomic missing value imputation and demonstrate its effectiveness in transcriptome and methylome data analysis. We show that in the vast majority of our testing scenarios, VAE achieves similar or better performances than the most widely used imputation standards, while having a computational advantage at evaluation time. When dealing with data missing not at random (e.g., few values are missing), we develop simple yet effective methodologies to leverage the prior knowledge about missing data. Furthermore, we investigate the effect of varying latent space regularization strength in VAE on the imputation performances and, in this context, show why VAE has a better imputation capacity compared to a regular deterministic auto-encoder.We describe a deep learning imputation framework for transcriptome and methylome data using a VAE and show that it can be a preferable alternative to traditional methods for data imputation, especially in the setting of large-scale data and certain missing-not-at-random scenarios.

    View details for DOI 10.1093/gigascience/giaa082

    View details for PubMedID 32761097

  • A shallow convolutional neural network predicts prognosis of lung cancer patients in multi-institutional computed tomography image datasets Nature Machine Intelligence Mukherjee, P., Zhou, M., Lee, E., Schicht, A., Balagurunathan, Y., Napel, S., Gillies, R., Wong, S., Thieme, A., Leung, A., Gevaert, O. 2020; 2 (5): 274–282
  • Imaging-AMARETTO: An Imaging Genomics Software Tool to Interrogate Multiomics Networks for Relevance to Radiography and Histopathology Imaging Biomarkers of Clinical Outcomes. JCO clinical cancer informatics Gevaert, O. n., Nabian, M. n., Bakr, S. n., Everaert, C. n., Shinde, J. n., Manukyan, A. n., Liefeld, T. n., Tabor, T. n., Xu, J. n., Lupberger, J. n., Haas, B. J., Baumert, T. F., Hernaez, M. n., Reich, M. n., Quintana, F. J., Uhlmann, E. J., Krichevsky, A. M., Mesirov, J. P., Carey, V. n., Pochet, N. n. 2020; 4: 421–35

    Abstract

    The availability of increasing volumes of multiomics, imaging, and clinical data in complex diseases such as cancer opens opportunities for the formulation and development of computational imaging genomics methods that can link multiomics, imaging, and clinical data.Here, we present the Imaging-AMARETTO algorithms and software tools to systematically interrogate regulatory networks derived from multiomics data within and across related patient studies for their relevance to radiography and histopathology imaging features predicting clinical outcomes.To demonstrate its utility, we applied Imaging-AMARETTO to integrate three patient studies of brain tumors, specifically, multiomics with radiography imaging data from The Cancer Genome Atlas (TCGA) glioblastoma multiforme (GBM) and low-grade glioma (LGG) cohorts and transcriptomics with histopathology imaging data from the Ivy Glioblastoma Atlas Project (IvyGAP) GBM cohort. Our results show that Imaging-AMARETTO recapitulates known key drivers of tumor-associated microglia and macrophage mechanisms, mediated by STAT3, AHR, and CCR2, and neurodevelopmental and stemness mechanisms, mediated by OLIG2. Imaging-AMARETTO provides interpretation of their underlying molecular mechanisms in light of imaging biomarkers of clinical outcomes and uncovers novel master drivers, THBS1 and MAP2, that establish relationships across these distinct mechanisms.Our network-based imaging genomics tools serve as hypothesis generators that facilitate the interrogation of known and uncovering of novel hypotheses for follow-up with experimental validation studies. We anticipate that our Imaging-AMARETTO imaging genomics tools will be useful to the community of biomedical researchers for applications to similar studies of cancer and other complex diseases with available multiomics, imaging, and clinical data.

    View details for DOI 10.1200/CCI.19.00125

    View details for PubMedID 32383980

  • Development of a DNA Methylation-Based Diagnostic Signature to Distinguish Benign Oncocytoma From Renal Cell Carcinoma. JCO precision oncology Brennan, K. n., Metzner, T. J., Kao, C. S., Massie, C. E., Stewart, G. D., Haile, R. W., Brooks, J. D., Hitchins, M. P., Leppert, J. T., Gevaert, O. n. 2020; 4

    Abstract

    A challenge in the diagnosis of renal cell carcinoma (RCC) is to distinguish chromophobe RCC (chRCC) from benign renal oncocytoma, because these tumor types are histologically and morphologically similar, yet they require different clinical management. Molecular biomarkers could provide a way of distinguishing oncocytoma from chRCC, which could prevent unnecessary treatment of oncocytoma. Such biomarkers could also be applied to preoperative biopsy specimens such as needle core biopsy specimens, to avoid unnecessary surgery of oncocytoma.We profiled DNA methylation in fresh-frozen oncocytoma and chRCC tumors and adjacent normal tissue and used machine learning to identify a signature of differentially methylated cytosine-phosphate-guanine sites (CpGs) that robustly distinguish oncocytoma from chRCC.Unsupervised clustering of Stanford and preexisting RCC data from The Cancer Genome Atlas (TCGA) revealed that of all RCC subtypes, oncocytoma is most similar to chRCC. Unexpectedly, however, oncocytoma features more extensive, overall abnormal methylation than does chRCC. We identified 79 CpGs with large methylation differences between oncocytoma and chRCC. A diagnostic model trained on 30 CpGs could distinguish oncocytoma from chRCC in 10-fold cross-validation (area under the receiver operating curve [AUC], 0.96 (95% CI, 0.88 to 1.00)) and could distinguish TCGA chRCCs from an independent set of oncocytomas from a previous study (AUC, 0.87). This signature also separated oncocytoma from other RCC subtypes and normal tissue, revealing it as a standalone diagnostic biomarker for oncocytoma.This CpG signature could be developed as a clinical biomarker to support differential diagnosis of oncocytoma and chRCC in surgical samples. With improved biopsy techniques, this signature could be applied to preoperative biopsy specimens.

    View details for DOI 10.1200/PO.20.00015

    View details for PubMedID 33015531

    View details for PubMedCentralID PMC7529536

  • Whole slide images reflect DNA methylation patterns of human tumors. NPJ genomic medicine Zheng, H. n., Momeni, A. n., Cedoz, P. L., Vogel, H. n., Gevaert, O. n. 2020; 5: 11

    Abstract

    DNA methylation is an important epigenetic mechanism regulating gene expression and its role in carcinogenesis has been extensively studied. High-throughput DNA methylation assays have been used broadly in cancer research. Histopathology images are commonly obtained in cancer treatment, given that tissue sampling remains the clinical gold-standard for diagnosis. In this work, we investigate the interaction between cancer histopathology images and DNA methylation profiles to provide a better understanding of tumor pathobiology at the epigenetic level. We demonstrate that classical machine learning algorithms can associate the DNA methylation profiles of cancer samples with morphometric features extracted from whole slide images. Furthermore, grouping the genes into methylation clusters greatly improves the performance of the models. The well-predicted genes are enriched in key pathways in carcinogenesis including hypoxia in glioma and angiogenesis in renal cell carcinoma. Our results provide new insights into the link between histopathological and molecular data.

    View details for DOI 10.1038/s41525-020-0120-9

    View details for PubMedID 32194984

    View details for PubMedCentralID PMC7064513

  • The impact of DNA methylation on the cancer proteome. PLoS computational biology Magzoub, M. M., Prunello, M., Brennan, K., Gevaert, O. 2019; 15 (7): e1007245

    Abstract

    Aberrant DNA methylation disrupts normal gene expression in cancer and broadly contributes to oncogenesis. We previously developed MethylMix, a model-based algorithmic approach to identify epigenetically regulated driver genes. MethylMix identifies genes where methylation likely executes a functional role by using transcriptomic data to select only methylation events that can be linked to changes in gene expression. However, given that proteins more closely link genotype to phenotype recent high-throughput proteomic data provides an opportunity to more accurately identify functionally relevant abnormal methylation events. Here we present a MethylMix analysis that refines nominations for epigenetic driver genes by leveraging quantitative high-throughput proteomic data to select only genes where DNA methylation is predictive of protein abundance. Applying our algorithm across three cancer cohorts we find that using protein abundance data narrows candidate nominations, where the effect of DNA methylation is often buffered at the protein level. Next, we find that MethylMix genes predictive of protein abundance are enriched for biological processes involved in cancer including functions involved in epithelial and mesenchymal transition. Moreover, our results are also enriched for tumor markers which are predictive of clinical features like tumor stage and we find clustering using MethylMix genes predictive of protein abundance captures cancer subtypes.

    View details for DOI 10.1371/journal.pcbi.1007245

    View details for PubMedID 31356589

  • Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics (Oxford, England) Cheerla, A., Gevaert, O. 2019; 35 (14): i446-i454

    Abstract

    Estimating the future course of patients with cancer lesions is invaluable to physicians; however, current clinical methods fail to effectively use the vast amount of multimodal data that is available for cancer patients. To tackle this problem, we constructed a multimodal neural network-based model to predict the survival of patients for 20 different cancer types using clinical data, mRNA expression data, microRNA expression data and histopathology whole slide images (WSIs). We developed an unsupervised encoder to compress these four data modalities into a single feature vector for each patient, handling missing data through a resilient, multimodal dropout method. Encoding methods were tailored to each data type-using deep highway networks to extract features from clinical and genomic data, and convolutional neural networks to extract features from WSIs.We used pancancer data to train these feature encodings and predict single cancer and pancancer overall survival, achieving a C-index of 0.78 overall. This work shows that it is possible to build a pancancer model for prognosis that also predicts prognosis in single cancer sites. Furthermore, our model handles multiple data modalities, efficiently analyzes WSIs and represents patient multimodal data flexibly into an unsupervised, informative representation. We thus present a powerful automated tool to accurately determine prognosis, a key step towards personalized treatment for cancer patients.https://github.com/gevaertlab/MultimodalPrognosis.

    View details for DOI 10.1093/bioinformatics/btz342

    View details for PubMedID 31510656

  • Development and validation of radiomic signatures of head and neck squamous cell carcinoma molecular features and subtypes. EBioMedicine Huang, C. n., Cintra, M. n., Brennan, K. n., Zhou, M. n., Colevas, A. D., Fischbein, N. n., Zhu, S. n., Gevaert, O. n. 2019

    Abstract

    Radiomics-based non-invasive biomarkers are promising to facilitate the translation of therapeutically related molecular subtypes for treatment allocation of patients with head and neck squamous cell carcinoma (HNSCC).We included 113 HNSCC patients from The Cancer Genome Atlas (TCGA-HNSCC) project. Molecular phenotypes analyzed were RNA-defined HPV status, five DNA methylation subtypes, four gene expression subtypes and five somatic gene mutations. A total of 540 quantitative image features were extracted from pre-treatment CT scans. Features were selected and used in a regularized logistic regression model to build binary classifiers for each molecular subtype. Models were evaluated using the average area under the Receiver Operator Characteristic curve (AUC) of a stratified 10-fold cross-validation procedure repeated 10 times. Next, an HPV model was trained with the TCGA-HNSCC, and tested on a Stanford cohort (N = 53).Our results show that quantitative image features are capable of distinguishing several molecular phenotypes. We obtained significant predictive performance for RNA-defined HPV+ (AUC = 0.73), DNA methylation subtypes MethylMix HPV+ (AUC = 0.79), non-CIMP-atypical (AUC = 0.77) and Stem-like-Smoking (AUC = 0.71), and mutation of NSD1 (AUC = 0.73). We externally validated the HPV prediction model (AUC = 0.76) on the Stanford cohort. When compared to clinical models, radiomic models were superior to subtypes such as NOTCH1 mutation and DNA methylation subtype non-CIMP-atypical while were inferior for DNA methylation subtype CIMP-atypical and NSD1 mutation.Our study demonstrates that radiomics can potentially serve as a non-invasive tool to identify treatment-relevant subtypes of HNSCC, opening up the possibility for patient stratification, treatment allocation and inclusion in clinical trials. FUND: Dr. Gevaert reports grants from National Institute of Dental & Craniofacial Research (NIDCR) U01 DE025188, grants from National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health (NIBIB), R01 EB020527, grants from National Cancer Institute (NCI), U01 CA217851, during the conduct of the study; Dr. Huang and Dr. Zhu report grants from China Scholarship Council (Grant NO:201606320087), grants from China Medical Board Collaborating Program (Grant NO:15-216), the Cyrus Tang Foundation, and the Zhejiang University Education Foundation during the conduct of the study; Dr. Cintra reports grants from São Paulo State Foundation for Teaching and Research (FAPESP), during the conduct of the study.

    View details for DOI 10.1016/j.ebiom.2019.06.034

    View details for PubMedID 31255659

  • Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples. GigaScience Zheng, H. n., Brennan, K. n., Hernaez, M. n., Gevaert, O. n. 2019; 8 (12)

    Abstract

    Long non-coding RNAs (lncRNAs) are emerging as important regulators of various biological processes. While many studies have exploited public resources such as RNA sequencing (RNA-Seq) data in The Cancer Genome Atlas to study lncRNAs in cancer, it is crucial to choose the optimal method for accurate expression quantification.In this study, we compared the performance of pseudoalignment methods Kallisto and Salmon, alignment-based transcript quantification method RSEM, and alignment-based gene quantification methods HTSeq and featureCounts, in combination with read aligners STAR, Subread, and HISAT2, in lncRNA quantification, by applying them to both un-stranded and stranded RNA-Seq datasets. Full transcriptome annotation, including protein-coding and non-coding RNAs, greatly improves the specificity of lncRNA expression quantification. Pseudoalignment methods and RSEM outperform HTSeq and featureCounts for lncRNA quantification at both sample- and gene-level comparison, regardless of RNA-Seq protocol type, choice of aligners, and transcriptome annotation. Pseudoalignment methods and RSEM detect more lncRNAs and correlate highly with simulated ground truth. On the contrary, HTSeq and featureCounts often underestimate lncRNA expression. Antisense lncRNAs are poorly quantified by alignment-based gene quantification methods, which can be improved using stranded protocols and pseudoalignment methods.Considering the consistency with ground truth and computational resources, pseudoalignment methods Kallisto or Salmon in combination with full transcriptome annotation is our recommended strategy for RNA-Seq analysis for lncRNAs.

    View details for DOI 10.1093/gigascience/giz145

    View details for PubMedID 31808800

  • MethylMix 2.0: an R package for identifying DNA methylation genes. Bioinformatics (Oxford, England) Cedoz, P., Prunello, M., Brennan, K., Gevaert, O. 2018

    Abstract

    Summary: DNA methylation is an important mechanism regulating gene transcription, and its role in carcinogenesis has been extensively studied. Hyper and hypomethylation of genes is a major mechanism of gene expression deregulation in a wide range of diseases. At the same time, high-throughput DNA methylation assays have been developed generating vast amounts of genome wide DNA methylation measurements. We developed MethylMix, an algorithm implemented in R to identify disease specific hyper and hypomethylated genes. Here we present a new version of MethylMix that automates the construction of DNA-methylation and gene expression datasets from The Cancer Genome Atlas (TCGA). More precisely, MethylMix 2.0 incorporates two major updates: the automated downloading of DNA methylation and gene expression datasets from TCGA and the automated preprocessing of such datasets: value imputation, batch correction and CpG sites clustering within each gene. The resulting datasets can subsequently be analyzed with MethylMix to identify transcriptionally predictive methylation states. We show that the Differential Methylation Values created by MethylMix can be used for cancer subtyping.Contact: olivier.gevaert@stanford.edu.Documentation: https://bioconductor.org/packages/release/bioc/manuals/MethylMix/man/MethylMix.pdf.Availability and implementation: MethylMix 2.0 was implemented as an R package and is available in bioconductor.

    View details for PubMedID 29668835

  • Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation CELL Malta, T. M., Sokolov, A., Gentles, A. J., Burzykowski, T., Poisson, L., Weinstein, J. N., Kaminska, B., Huelsken, J., Omberg, L., Gevaert, O., Colaprico, A., Czerwinska, P., Mazurek, S., Mishra, L., Heyn, H., Krasnitz, A., Godwin, A. K., Lazar, A. J., Stuart, J. M., Hoadley, K. A., Laird, P. W., Noushmehr, H., Wiznerowicz, M., Cancer Genome Atlas Res Network 2018; 173 (2): 338-+

    Abstract

    Cancer progression involves the gradual loss of a differentiated phenotype and acquisition of progenitor and stem-cell-like features. Here, we provide novel stemness indices for assessing the degree of oncogenic dedifferentiation. We used an innovative one-class logistic regression (OCLR) machine-learning algorithm to extract transcriptomic and epigenetic feature sets derived from non-transformed pluripotent stem cells and their differentiated progeny. Using OCLR, we were able to identify previously undiscovered biological mechanisms associated with the dedifferentiated oncogenic state. Analyses of the tumor microenvironment revealed unanticipated correlation of cancer stemness with immune checkpoint expression and infiltrating immune cells. We found that the dedifferentiated oncogenic phenotype was generally most prominent in metastatic tumors. Application of our stemness indices to single-cell data revealed patterns of intra-tumor molecular heterogeneity. Finally, the indices allowed for the identification of novel targets and possible targeted therapies aimed at tumor differentiation.

    View details for PubMedID 29625051

  • Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas CELL REPORTS Campbell, J. D., Yau, C., Bowlby, R., Liu, Y., Brennan, K., Fan, H., Taylor, A. M., Wang, C., Walter, V., Akbani, R., Byers, L., Creighton, C. J., Coarfa, C., Shih, J., Cherniack, A. D., Gevaert, O., Prunello, M., Shen, H., Anur, P., Chen, J., Cheng, H., Hayes, D., Bullman, S., Pedamallu, C., Ojesina, A. I., Sadeghi, S., Mungall, K. L., Robertson, A., Benz, C., Schultz, A., Kanchi, R. S., Gay, C. M., Hegde, A., Diao, L., Wang, J., Ma, W., Sumazin, P., Chiu, H., Chen, T., Gunaratne, P., Donehower, L., Rader, J. S., Zuna, R., Al-Ahmadie, H., Lazar, A. J., Flores, E. R., Tsai, K. Y., Zhou, J. H., Rustgi, A. K., Drill, E., Shen, R., Wong, C. K., Stuart, J. M., Laird, P. W., Hoadley, K. A., Weinstein, J. N., Peto, M., Pickering, C. R., Chen, Z., Van Waes, C., Canc Genome Atlas Res Network 2018; 23 (1): 194-+

    Abstract

    This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing molecular features of squamous cell carcinomas (SCCs) from five sites associated with smoking and/or human papillomavirus (HPV). SCCs harbor 3q, 5p, and other recurrent chromosomal copy-number alterations (CNAs), DNA mutations, and/or aberrant methylation of genes and microRNAs, which are correlated with the expression of multi-gene programs linked to squamous cell stemness, epithelial-to-mesenchymal differentiation, growth, genomic integrity, oxidative damage, death, and inflammation. Low-CNA SCCs tended to be HPV(+) and display hypermethylation with repression of TET1 demethylase and FANCF, previously linked to predisposition to SCC, or harbor mutations affecting CASP8, RAS-MAPK pathways, chromatin modifiers, and immunoregulatory molecules. We uncovered hypomethylation of the alternative promoter that drives expression of the ΔNp63 oncogene and embedded miR944. Co-expression of immune checkpoint, T-regulatory, and Myeloid suppressor cells signatures may explain reduced efficacy of immune therapy. These findings support possibilities for molecular classification and therapeutic approaches.

    View details for PubMedID 29617660

  • Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response. EBioMedicine Champion, M. n., Brennan, K. n., Croonenborghs, T. n., Gentles, A. J., Pochet, N. n., Gevaert, O. n. 2018; 27: 156–66

    Abstract

    The availability of increasing volumes of multi-omics profiles across many cancers promises to improve our understanding of the regulatory mechanisms underlying cancer. The main challenge is to integrate these multiple levels of omics profiles and especially to analyze them across many cancers. Here we present AMARETTO, an algorithm that addresses both challenges in three steps. First, AMARETTO identifies potential cancer driver genes through integration of copy number, DNA methylation and gene expression data. Then AMARETTO connects these driver genes with co-expressed target genes that they control, defined as regulatory modules. Thirdly, we connect AMARETTO modules identified from different cancer sites into a pancancer network to identify cancer driver genes. Here we applied AMARETTO in a pancancer study comprising eleven cancer sites and confirmed that AMARETTO captures hallmarks of cancer. We also demonstrated that AMARETTO enables the identification of novel pancancer driver genes. In particular, our analysis led to the identification of pancancer driver genes of smoking-induced cancers and 'antiviral' interferon-modulated innate immune response.AMARETTO is available as an R package at https://bitbucket.org/gevaertlab/pancanceramaretto.

    View details for PubMedID 29331675

  • Intestinal Enteroendocrine Lineage Cells Possess Homeostatic and Injury-Inducible Stem Cell Activity. Cell stem cell Yan, K. S., Gevaert, O., Zheng, G. X., Anchang, B., Probert, C. S., Larkin, K. A., Davies, P. S., Cheng, Z. F., Kaddis, J. S., Han, A., Roelf, K., Calderon, R. I., Cynn, E., Hu, X., Mandleywala, K., Wilhelmy, J., Grimes, S. M., Corney, D. C., Boutet, S. C., Terry, J. M., Belgrader, P., Ziraldo, S. B., Mikkelsen, T. S., Wang, F., von Furstenberg, R. J., Smith, N. R., Chandrakesan, P., May, R., Chrissy, M. A., Jain, R., Cartwright, C. A., Niland, J. C., Hong, Y. K., Carrington, J., Breault, D. T., Epstein, J., Houchen, C. W., Lynch, J. P., Martin, M. G., Plevritis, S. K., Curtis, C., Ji, H. P., Li, L., Henning, S. J., Wong, M. H., Kuo, C. J. 2017; 21 (1): 78-90.e6

    Abstract

    Several cell populations have been reported to possess intestinal stem cell (ISC) activity during homeostasis and injury-induced regeneration. Here, we explored inter-relationships between putative mouse ISC populations by comparative RNA-sequencing (RNA-seq). The transcriptomes of multiple cycling ISC populations closely resembled Lgr5+ISCs, the most well-defined ISC pool, but Bmi1-GFP+cells were distinct and enriched for enteroendocrine (EE) markers, including Prox1. Prox1-GFP+cells exhibited sustained clonogenic growth in vitro, and lineage-tracing of Prox1+cells revealed long-lived clones during homeostasis and after radiation-induced injury in vivo. Single-cell mRNA-seq revealed two subsets of Prox1-GFP+cells, one of which resembled mature EE cells while the other displayed low-level EE gene expression but co-expressed tuft cell markers, Lgr5 and Ascl2, reminiscent of label-retaining secretory progenitors. Our data suggest that the EE lineage, including mature EE cells, comprises a reservoir of homeostatic and injury-inducible ISCs, extending our understanding of cellular plasticity and stemness.

    View details for DOI 10.1016/j.stem.2017.06.014

    View details for PubMedID 28686870

    View details for PubMedCentralID PMC5642297

  • Identification of an atypical etiological head and neck squamous carcinoma subtype featuring the CpG island methylator phenotype. EBioMedicine Brennan, K., Koenig, J. L., Gentles, A. J., Sunwoo, J. B., Gevaert, O. 2017; 17: 223-236

    Abstract

    Head and neck squamous cell carcinoma (HNSCC) is broadly classified into HNSCC associated with human papilloma virus (HPV) infection, and HPV negative HNSCC, which is typically smoking-related. A subset of HPV negative HNSCCs occur in patients without smoking history, however, and these etiologically 'atypical' HNSCCs disproportionately occur in the oral cavity, and in female patients, suggesting a distinct etiology. To investigate the determinants of clinical and molecular heterogeneity, we performed unsupervised clustering to classify 528 HNSCC patients from The Cancer Genome Atlas (TCGA) into putative intrinsic subtypes based on their profiles of epigenetically (DNA methylation) deregulated genes. HNSCCs clustered into five subtypes, including one HPV positive subtype, two smoking-related subtypes, and two atypical subtypes. One atypical subtype was particularly genomically stable, but featured widespread gene silencing associated with the 'CpG island methylator phenotype' (CIMP). Further distinguishing features of this 'CIMP-Atypical' subtype include an antiviral gene expression profile associated with pro-inflammatory M1 macrophages and CD8+ T cell infiltration, CASP8 mutations, and a well-differentiated state corresponding to normal SOX2 copy number and SOX2OT hypermethylation. We developed a gene expression classifier for the CIMP-Atypical subtype that could classify atypical disease features in two independent patient cohorts, demonstrating the reproducibility of this subtype. Taken together, these findings provide unprecedented evidence that atypical HNSCC is molecularly distinct, and postulates the CIMP-Atypical subtype as a distinct clinical entity that may be caused by chronic inflammation.

    View details for DOI 10.1016/j.ebiom.2017.02.025

    View details for PubMedID 28314692

  • Noninvasive radiomics signature based on quantitative analysis of computed tomography images as a surrogate for microvascular invasion in hepatocellular carcinoma: a pilot study. Journal of medical imaging (Bellingham, Wash.) Bakr, S. n., Echegaray, S. n., Shah, R. n., Kamaya, A. n., Louie, J. n., Napel, S. n., Kothary, N. n., Gevaert, O. n. 2017; 4 (4): 041303

    Abstract

    We explore noninvasive biomarkers of microvascular invasion (mVI) in patients with hepatocellular carcinoma (HCC) using quantitative and semantic image features extracted from contrast-enhanced, triphasic computed tomography (CT). Under institutional review board approval, we selected 28 treatment-naive HCC patients who underwent surgical resection. Four radiologists independently selected and delineated tumor margins on three axial CT images and extracted computational features capturing tumor shape, image intensities, and texture. We also computed two types of "delta features," defined as the absolute difference and the ratio computed from all pairs of imaging phases for each feature. 717 arterial, portal-venous, delayed single-phase, and delta-phase features were robust against interreader variability ([Formula: see text]). An enhanced cross-validation analysis showed that combining robust single-phase and delta features in the arterial and venous phases identified mVI (AUC [Formula: see text]). Compared to a previously reported semantic feature signature (AUC 0.47 to 0.58), these features in our cohort showed only slight to moderate agreement (Cohen's kappa range: 0.03 to 0.59). Though preliminary, quantitative analysis of image features in arterial and venous phases may be potential surrogate biomarkers for mVI in HCC. Further study in a larger cohort is warranted.

    View details for PubMedID 28840174

  • Magnetic resonance image features identify glioblastoma phenotypic subtypes with distinct molecular pathway activities. Science translational medicine Itakura, H., Achrol, A. S., Mitchell, L. A., Loya, J. J., Liu, T., Westbroek, E. M., Feroze, A. H., Rodriguez, S., Echegaray, S., Azad, T. D., Yeom, K. W., Napel, S., Rubin, D. L., Chang, S. D., Harsh, G. R., Gevaert, O. 2015; 7 (303): 303ra138-?

    Abstract

    Glioblastoma (GBM) is the most common and highly lethal primary malignant brain tumor in adults. There is a dire need for easily accessible, noninvasive biomarkers that can delineate underlying molecular activities and predict response to therapy. To this end, we sought to identify subtypes of GBM, differentiated solely by quantitative magnetic resonance (MR) imaging features, that could be used for better management of GBM patients. Quantitative image features capturing the shape, texture, and edge sharpness of each lesion were extracted from MR images of 121 single-institution patients with de novo, solitary, unilateral GBM. Three distinct phenotypic "clusters" emerged in the development cohort using consensus clustering with 10,000 iterations on these image features. These three clusters--pre-multifocal, spherical, and rim-enhancing, names reflecting their image features--were validated in an independent cohort consisting of 144 multi-institution patients with similar tumor characteristics from The Cancer Genome Atlas (TCGA). Each cluster mapped to a unique set of molecular signaling pathways using pathway activity estimates derived from the analysis of TCGA tumor copy number and gene expression data with the PARADIGM (Pathway Recognition Algorithm Using Data Integration on Genomic Models) algorithm. Distinct pathways, such as c-Kit and FOXA, were enriched in each cluster, indicating differential molecular activities as determined by the image features. Each cluster also demonstrated differential probabilities of survival, indicating prognostic importance. Our imaging method offers a noninvasive approach to stratify GBM patients and also provides unique sets of molecular signatures to inform targeted therapy and personalized treatment of GBM.

    View details for DOI 10.1126/scitranslmed.aaa7582

    View details for PubMedID 26333934

  • Pancancer analysis of DNA methylation-driven genes using MethylMix GENOME BIOLOGY Gevaert, O., Tibshirani, R., Plevritis, S. K. 2015; 16

    Abstract

    Aberrant DNA methylation is an important mechanism that contributes to oncogenesis. Yet, few algorithms exist that exploit this vast dataset to identify hypo- and hypermethylated genes in cancer. We developed a novel computational algorithm called MethylMix to identify differentially methylated genes that are also predictive of transcription. We apply MethylMix to 12 individual cancer sites, and additionally combine all cancer sites in a pancancer analysis. We discover pancancer hypo- and hypermethylated genes and identify novel methylation-driven subgroups with clinical implications. MethylMix analysis on combined cancer sites reveals 10 pancancer clusters reflecting new similarities across malignantly transformed tissues.

    View details for DOI 10.1186/s13059-014-0579-8

    View details for Web of Science ID 000351817300001

    View details for PubMedID 25631659

    View details for PubMedCentralID PMC4365533

  • CaMoDi: a new method for cancer module discovery BMC GENOMICS Manolakos, A., Ochoa, I., Venkat, K., Goldsmith, A. J., Gevaert, O. 2014; 15

    Abstract

    Identification of genomic patterns in tumors is an important problem, which would enable the community to understand and extend effective therapies across the current tissue-based tumor boundaries. With this in mind, in this work we develop a robust and fast algorithm to discover cancer driver genes using an unsupervised clustering of similarly expressed genes across cancer patients. Specifically, we introduce CaMoDi, a new method for module discovery which demonstrates superior performance across a number of computational and statistical metrics.The proposed algorithm CaMoDi demonstrates effective statistical performance compared to the state of the art, and is algorithmically simple and scalable - which makes it suitable for tissue-independent genomic characterization of individual tumors as well as groups of tumors. We perform an extensive comparative study between CaMoDi and two previously developed methods (CONEXIC and AMARETTO), across 11 individual tumors and 8 combinations of tumors from The Cancer Genome Atlas. We demonstrate that CaMoDi is able to discover modules with better average consistency and homogeneity, with similar or better adjusted R2 performance compared to CONEXIC and AMARETTO.We present a novel method for Cancer Module Discovery, CaMoDi, and demonstrate through extensive simulations on the TCGA Pan-Cancer dataset that it achieves comparable or better performance than that of CONEXIC and AMARETTO, while achieving an order-of-magnitude improvement in computational run time compared to the other methods.

    View details for DOI 10.1186/1471-2164-15-S10-S8

    View details for Web of Science ID 000346166900008

    View details for PubMedID 25560933

  • Glioblastoma Multiforme: Exploratory Radiogenomic Analysis by Using Quantitative Image Features RADIOLOGY Gevaert, O., Mitchell, L. A., Achrol, A. S., Xu, J., Echegaray, S., Steinberg, G. K., Cheshier, S. H., Napel, S., Zaharchuk, G., Plevritis, S. K. 2014; 273 (1): 168-174

    Abstract

    To derive quantitative image features from magnetic resonance (MR) images that characterize the radiographic phenotype of glioblastoma multiforme (GBM) lesions and to create radiogenomic maps associating these features with various molecular data.Clinical, molecular, and MR imaging data for GBMs in 55 patients were obtained from the Cancer Genome Atlas and the Cancer Imaging Archive after local ethics committee and institutional review board approval. Regions of interest (ROIs) corresponding to enhancing necrotic portions of tumor and peritumoral edema were drawn, and quantitative image features were derived from these ROIs. Robust quantitative image features were defined on the basis of an intraclass correlation coefficient of 0.6 for a digital algorithmic modification and a test-retest analysis. The robust features were visualized by using hierarchic clustering and were correlated with survival by using Cox proportional hazards modeling. Next, these robust image features were correlated with manual radiologist annotations from the Visually Accessible Rembrandt Images (VASARI) feature set and GBM molecular subgroups by using nonparametric statistical tests. A bioinformatic algorithm was used to create gene expression modules, defined as a set of coexpressed genes together with a multivariate model of cancer driver genes predictive of the module's expression pattern. Modules were correlated with robust image features by using the Spearman correlation test to create radiogenomic maps and to link robust image features with molecular pathways.Eighteen image features passed the robustness analysis and were further analyzed for the three types of ROIs, for a total of 54 image features. Three enhancement features were significantly correlated with survival, 77 significant correlations were found between robust quantitative features and the VASARI feature set, and seven image features were correlated with molecular subgroups (P < .05 for all). A radiogenomics map was created to link image features with gene expression modules and allowed linkage of 56% (30 of 54) of the image features with biologic processes.Radiogenomic approaches in GBM have the potential to predict clinical and molecular characteristics of tumors noninvasively. Online supplemental material is available for this article.

    View details for DOI 10.1148/radiol.14131731

    View details for Web of Science ID 000344232100019

    View details for PubMedCentralID PMC4263772

  • Identifying master regulators of cancer and their downstream targets by integrating genomic and epigenomic features. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Gevaert, O., Plevritis, S. 2013: 123-134

    Abstract

    Vast amounts of molecular data characterizing the genome, epigenome and transcriptome are becoming available for a variety of cancers. The current challenge is to integrate these diverse layers of molecular biology information to create a more comprehensive view of key biological processes underlying cancer. We developed a biocomputational algorithm that integrates copy number, DNA methylation, and gene expression data to study master regulators of cancer and identify their targets. Our algorithm starts by generating a list of candidate driver genes based on the rationale that genes that are driven by multiple genomic events in a subset of samples are unlikely to be randomly deregulated. We then select the master regulators from the candidate driver and identify their targets by inferring the underlying regulatory network of gene expression. We applied our biocomputational algorithm to identify master regulators and their targets in glioblastoma multiforme (GBM) and serous ovarian cancer. Our results suggest that the expression of candidate drivers is more likely to be influenced by copy number variations than DNA methylation. Next, we selected the master regulators and identified their downstream targets using module networks analysis. As a proof-of-concept, we show that the GBM and ovarian cancer module networks recapitulate known processes in these cancers. In addition, we identify master regulators that have not been previously reported and suggest their likely role. In summary, focusing on genes whose expression can be explained by their genomic and epigenomic aberrations is a promising strategy to identify master regulators of cancer.

    View details for PubMedID 23424118

  • Non-Small Cell Lung Cancer: Identifying Prognostic Imaging Biomarkers by Leveraging Public Gene Expression Microarray Data-Methods and Preliminary Results RADIOLOGY Gevaert, O., Xu, J., Hoang, C. D., Leung, A. N., Xu, Y., Quon, A., Rubin, D. L., Napel, S., Plevritis, S. K. 2012; 264 (2): 387-396

    Abstract

    To identify prognostic imaging biomarkers in non-small cell lung cancer (NSCLC) by means of a radiogenomics strategy that integrates gene expression and medical images in patients for whom survival outcomes are not available by leveraging survival data in public gene expression data sets.A radiogenomics strategy for associating image features with clusters of coexpressed genes (metagenes) was defined. First, a radiogenomics correlation map is created for a pairwise association between image features and metagenes. Next, predictive models of metagenes are built in terms of image features by using sparse linear regression. Similarly, predictive models of image features are built in terms of metagenes. Finally, the prognostic significance of the predicted image features are evaluated in a public gene expression data set with survival outcomes. This radiogenomics strategy was applied to a cohort of 26 patients with NSCLC for whom gene expression and 180 image features from computed tomography (CT) and positron emission tomography (PET)/CT were available.There were 243 statistically significant pairwise correlations between image features and metagenes of NSCLC. Metagenes were predicted in terms of image features with an accuracy of 59%-83%. One hundred fourteen of 180 CT image features and the PET standardized uptake value were predicted in terms of metagenes with an accuracy of 65%-86%. When the predicted image features were mapped to a public gene expression data set with survival outcomes, tumor size, edge shape, and sharpness ranked highest for prognostic significance.This radiogenomics strategy for identifying imaging biomarkers may enable a more rapid evaluation of novel imaging modalities, thereby accelerating their translation to personalized medicine.

    View details for DOI 10.1148/radiol.12111607

    View details for PubMedID 22723499

  • A 3D lung lesion variational autoencoder. Cell reports methods Li, Y., Sadée, C. Y., Carrillo-Perez, F., Selby, H. M., Thieme, A. H., Gevaert, O. 2024: 100695

    Abstract

    In this study, we develop a 3D beta variational autoencoder (beta-VAE) to advance lung cancer imaging analysis, countering the constraints of conventional radiomics methods. The autoencoder extracts information from public lung computed tomography (CT) datasets without additional labels. It reconstructs 3D lung nodule images with high quality (structural similarity: 0.774, peak signal-to-noise ratio: 26.1, and mean-squared error: 0.0008). The model effectively encodes lesion sizes in its latent embeddings, with a significant correlation with lesion size found after applying uniform manifold approximation and projection (UMAP) for dimensionality reduction. Additionally, the beta-VAE can synthesize new lesions of varying sizes by manipulating the latent features. The model can predict multiple clinical endpoints, including pathological N stage or KRAS mutation status, on the Stanford radiogenomics lung cancer dataset. Comparisons with other methods show that the beta-VAE performs equally well in these tasks, suggesting its potential as a pretrained model for predicting patient outcomes in medical imaging.

    View details for DOI 10.1016/j.crmeth.2024.100695

    View details for PubMedID 38278157

  • Natural language processing system for rapid detection and intervention of mental health crisis chat messages. NPJ digital medicine Swaminathan, A., Lopez, I., Mar, R. A., Heist, T., McClintock, T., Caoili, K., Grace, M., Rubashkin, M., Boggs, M. N., Chen, J. H., Gevaert, O., Mou, D., Nock, M. K. 2023; 6 (1): 213

    Abstract

    Patients experiencing mental health crises often seek help through messaging-based platforms, but may face long wait times due to limited message triage capacity. Here we build and deploy a machine-learning-enabled system to improve response times to crisis messages in a large, national telehealth provider network. We train a two-stage natural language processing (NLP) system with key word filtering followed by logistic regression on 721 electronic medical record chat messages, of which 32% are potential crises (suicidal/homicidal ideation, domestic violence, or non-suicidal self-injury). Model performance is evaluated on a retrospective test set (4/1/21-4/1/22, N=481) and a prospective test set (10/1/22-10/31/22, N=102,471). In the retrospective test set, the model has an AUC of 0.82 (95% CI: 0.78-0.86), sensitivity of 0.99 (95% CI: 0.96-1.00), and PPV of 0.35 (95% CI: 0.309-0.4). In the prospective test set, the model has an AUC of 0.98 (95% CI: 0.966-0.984), sensitivity of 0.98 (95% CI: 0.96-0.99), and PPV of 0.66 (95% CI: 0.626-0.692). The daily median time from message receipt to crisis specialist triage ranges from 8 to 13min, compared to 9h before the deployment of the system. We demonstrate that a NLP-based machine learning model can reliably identify potential crisis chat messages in a telehealth setting. Our system integrates into existing clinical workflows, suggesting that with appropriate training, humans can successfully leverage ML systems to facilitate triage of crisis messages.

    View details for DOI 10.1038/s41746-023-00951-3

    View details for PubMedID 37990134

  • Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19. Research square Er, A. G., Ding, D. Y., Er, B., Uzun, M., Cakmak, M., Sadee, C., Durhan, G., Ozmen, M. N., Tanriover, M. D., Topeli, A., Son, Y. A., Tibshirani, R., Unal, S., Gevaert, O. 2023

    Abstract

    Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

    View details for DOI 10.21203/rs.3.rs-3569833/v1

    View details for PubMedID 38045288

  • Loss of p53-DREAM-mediated repression of cell cycle genes as a driver of lymph node metastasis in head and neck cancer. Genome medicine Brennan, K., Espin-Perez, A., Chang, S., Bedi, N., Saumyaa, S., Shin, J. H., Plevritis, S. K., Gevaert, O., Sunwoo, J. B., Gentles, A. J. 2023; 15 (1): 98

    Abstract

    BACKGROUND: The prognosis for patients with head and neck cancer (HNC) is poor and has improved little in recent decades, partially due to lack of therapeutic options. To identify effective therapeutic targets, we sought to identify molecular pathways that drive metastasis and HNC progression, through large-scale systematic analyses of transcriptomic data.METHODS: We performed meta-analysis across 29 gene expression studies including 2074 primary HNC biopsies to identify genes and transcriptional pathways associated with survival and lymph node metastasis (LNM). To understand the biological roles of these genes in HNC, we identified their associated cancer pathways, as well as the cell types that express them within HNC tumor microenvironments, by integrating single-cell RNA-seq and bulk RNA-seq from sorted cell populations.RESULTS: Patient survival-associated genes were heterogenous and included drivers of diverse tumor biological processes: these included tumor-intrinsicprocesses such as epithelial dedifferentiation and epithelial to mesenchymal transition, as well as tumor microenvironmental factors such as T cell-mediated immunity and cancer-associated fibroblast activity. Unexpectedly, LNM-associated genes were almost universally associated with epithelial dedifferentiation within malignant cells. Genes negatively associated with LNM consisted of regulators of squamous epithelial differentiation that are expressed within well-differentiated malignant cells, while those positively associated with LNM represented cell cycle regulators thatare normally repressedby the p53-DREAM pathway. These pro-LNM genes are overexpressed in proliferating malignant cells of TP53 mutated and HPV+ve HNCs and are strongly associated with stemness, suggesting that they represent markers of pre-metastatic cancer stem-like cells. LNM-associated genes are deregulated in high-grade oral precancerous lesions, and deregulated further in primary HNCs with advancing tumor grade and deregulated further still in lymph node metastases.CONCLUSIONS: In HNC, patient survival is affected by multiple biological processes and is strongly influenced by the tumor immune and stromal microenvironments. In contrast, LNM appears to be driven primarily by malignant cell plasticity, characterized by epithelial dedifferentiation coupled with EMT-independent proliferation and stemness. Our findings postulate that LNM is initially caused by loss of p53-DREAM-mediated repression of cell cycle genes during early tumorigenesis.

    View details for DOI 10.1186/s13073-023-01236-w

    View details for PubMedID 37978395

  • Digital profiling of cancer transcriptomes from histology images with grouped vision attention. bioRxiv : the preprint server for biology Zheng, Y., Pizurica, M., Carrillo-Perez, F., Noor, H., Yao, W., Wohlfart, C., Marchal, K., Vladimirova, A., Gevaert, O. 2023

    Abstract

    Cancer is a heterogeneous disease that demands precise molecular profiling for better understanding and management. RNA-sequencing has emerged as a potent tool to unravel the transcriptional heterogeneity. However, large-scale characterization of cancer transcriptomes is hindered by the limitations of costs and tissue accessibility. Here, we develop SEQUOIA , a deep learning model employing a transformer architecture to predict cancer transcriptomes from whole-slide histology images. We pre-train the model using data from 2,242 normal tissues, and the model is fine-tuned and evaluated in 4,218 tumor samples across nine cancer types. The results are further validated across two independent cohorts compromising 1,305 tumors. The highest performance was observed in cancers from breast, kidney and lung, where SEQUOIA accurately predicted 13,798, 10,922 and 9,735 genes, respectively. The well predicted genes are associated with the regulation of inflammatory response, cell cycles and hypoxia-related metabolic pathways. Leveraging the well predicted genes, we develop a digital signature to predict the risk of recurrence in breast cancer. While the model is trained at the tissue-level, we showcase its potential in predicting spatial gene expression patterns using spatial transcriptomics datasets. SEQUOIA deciphers clinically relevant gene expression patterns from histology images, opening avenues for improved cancer management and personalized therapies.

    View details for DOI 10.1101/2023.09.28.560068

    View details for PubMedID 37808782

  • Selective prediction for extracting unstructured clinical data. Journal of the American Medical Informatics Association : JAMIA Swaminathan, A., Lopez, I., Wang, W., Srivastava, U., Tran, E., Bhargava-Shah, A., Wu, J. Y., Ren, A. L., Caoili, K., Bui, B., Alkhani, L., Lee, S., Mohit, N., Seo, N., Macedo, N., Cheng, W., Liu, C., Thomas, R., Chen, J. H., Gevaert, O. 2023

    Abstract

    While there are currently approaches to handle unstructured clinical data, such as manual abstraction and structured proxy variables, these methods may be time-consuming, not scalable, and imprecise. This article aims to determine whether selective prediction, which gives a model the option to abstain from generating a prediction, can improve the accuracy and efficiency of unstructured clinical data abstraction.We trained selective classifiers (logistic regression, random forest, support vector machine) to extract 5 variables from clinical notes: depression (n = 1563), glioblastoma (GBM, n = 659), rectal adenocarcinoma (DRA, n = 601), and abdominoperineal resection (APR, n = 601) and low anterior resection (LAR, n = 601) of adenocarcinoma. We varied the cost of false positives (FP), false negatives (FN), and abstained notes and measured total misclassification cost.The depression selective classifiers abstained on anywhere from 0% to 97% of notes, and the change in total misclassification cost ranged from -58% to 9%. Selective classifiers abstained on 5%-43% of notes across the GBM and colorectal cancer models. The GBM selective classifier abstained on 43% of notes, which led to improvements in sensitivity (0.94 to 0.96), specificity (0.79 to 0.96), PPV (0.89 to 0.98), and NPV (0.88 to 0.91) when compared to a non-selective classifier and when compared to structured proxy variables.We showed that selective classifiers outperformed both non-selective classifiers and structured proxy variables for extracting data from unstructured clinical notes.Selective prediction should be considered when abstaining is preferable to making an incorrect prediction.

    View details for DOI 10.1093/jamia/ocad182

    View details for PubMedID 37769323

  • Performance of alternative manual and automated deep learning segmentation techniques for the prediction of benign and malignant lung nodules. Journal of medical imaging (Bellingham, Wash.) Selby, H. M., Mukherjee, P., Parham, C., Malik, S. B., Gevaert, O., Napel, S., Shah, R. P. 2023; 10 (4): 044006

    Abstract

    We aim to evaluate the performance of radiomic biopsy (RB), best-fit bounding box (BB), and a deep-learning-based segmentation method called no-new-U-Net (nnU-Net), compared to the standard full manual (FM) segmentation method for predicting benign and malignant lung nodules using a computed tomography (CT) radiomic machine learning model.A total of 188 CT scans of lung nodules from 2 institutions were used for our study. One radiologist identified and delineated all 188 lung nodules, whereas a second radiologist segmented a subset (n=20) of these nodules. Both radiologists employed FM and RB segmentation methods. BB segmentations were generated computationally from the FM segmentations. The nnU-Net, a deep-learning-based segmentation method, performed automatic nodule detection and segmentation. The time radiologists took to perform segmentations was recorded. Radiomic features were extracted from each segmentation method, and models to predict benign and malignant lung nodules were developed. The Kruskal-Wallis and DeLong tests were used to compare segmentation times and areas under the curve (AUC), respectively.For the delineation of the FM, RB, and BB segmentations, the two radiologists required a median time (IQR) of 113 (54 to 251.5), 21 (9.25 to 38), and 16 (12 to 64.25) s, respectively (p=0.04). In dataset 1, the mean AUC (95% CI) of the FM, RB, BB, and nnU-Net model were 0.964 (0.96 to 0.968), 0.985 (0.983 to 0.987), 0.961 (0.956 to 0.965), and 0.878 (0.869 to 0.888). In dataset 2, the mean AUC (95% CI) of the FM, RB, BB, and nnU-Net model were 0.717 (0.705 to 0.729), 0.919 (0.913 to 0.924), 0.699 (0.687 to 0.711), and 0.644 (0.632 to 0.657).Radiomic biopsy-based models outperformed FM and BB models in prediction of benign and malignant lung nodules in two independent datasets while deep-learning segmentation-based models performed similarly to FM and BB. RB could be a more efficient segmentation method, but further validation is needed.

    View details for DOI 10.1117/1.JMI.10.4.044006

    View details for PubMedID 37564098

    View details for PubMedCentralID PMC10411216

  • Best Practices for Clinical Skin Image Acquisition in Translational Artificial Intelligence Research. The Journal of investigative dermatology Phung, M., Muralidharan, V., Rotemberg, V., Novoa, R. A., Chiou, A. S., Sadée, C. Y., Rapaport, B., Yekrang, K., Bitz, J., Gevaert, O., Ko, J. M., Daneshjou, R. 2023; 143 (7): 1127-1132

    Abstract

    Recent advances in artificial intelligence research have led to an increase in the development of algorithms for detecting malignancies from clinical and dermoscopic images of skin diseases. These methods are dependent on the collection of training and testing data. There are important considerations when acquiring skin images and data for translational artificial intelligence research. In this paper, we discuss the best practices and challenges for light photography image data collection, covering ethics, image acquisition, labeling, curation, and storage. The purpose of this work is to improve artificial intelligence for malignancy detection by supporting intentional data collection and collaboration between subject matter experts, such as dermatologists and data scientists.

    View details for DOI 10.1016/j.jid.2023.02.035

    View details for PubMedID 37353282

  • Targeting KDM2A Enhances T Cell Infiltration in NSD1-Deficient Head and Neck Squamous Cell Carcinoma. Cancer research Chen, C., Shin, J. H., Fang, Z., Brennan, K., Horowitz, N. B., Pfaff, K. L., Welsh, E. L., Rodig, S. J., Gevaert, O., Gozani, O., Uppaluri, R., Sunwoo, J. B. 2023

    Abstract

    In head and neck squamous cell carcinoma (HNSCC), a significant proportion of tumors have inactivating mutations in the histone methyltransferase NSD1. In these tumors, NSD1 inactivation is a driver of T cell exclusion from the tumor microenvironment (TME). A better understanding of the NSD1-mediated mechanism regulating infiltration of T cells into the TME could help identify approaches to overcome immununosuppression. Here, we demonstrated that NSD1 inactivation results in lower levels of H3K36 di-methylation and higher levels of H3K27 tri-methylation, the latter being a known repressive histone mark enriched on the promoters of key T cell chemokines CXCL9 and CXCL10. HNSCC with NSD1 mutations had lower levels of these chemokines and lacked responses to PD-1 immune checkpoint blockade. Inhibition of KDM2A, the primary lysine demethylase that is selective for H3K36, reversed the altered histone marks induced by NSD1 loss and restored T cell infiltration into the TME. Importantly, KDM2A suppression decreased growth of NSD1-deficient tumors in immunocompetent, but not in immunodeficient, mice. Together, these data indicate that KDM2A is an immunotherapeutic target for overcoming immune exclusion in HNSCC.

    View details for DOI 10.1158/0008-5472.CAN-22-3114

    View details for PubMedID 37311054

  • Toward more accurate and generalizable brain deformation estimators for traumatic brain injury detection with unsupervised domain adaptation. ArXiv Zhan, X., Sun, J., Liu, Y., Cecchi, N. J., Le Flao, E., Gevaert, O., Zeineh, M. M., Camarillo, D. B. 2023

    Abstract

    Machine learning head models (MLHMs) are developed to estimate brain deformation for early detection of traumatic brain injury (TBI). However, the overfitting to simulated impacts and the lack of generalizability caused by distributional shift of different head impact datasets hinders the broad clinical applications of current MLHMs. We propose brain deformation estimators that integrates unsupervised domain adaptation with a deep neural network to predict whole-brain maximum principal strain (MPS) and MPS rate (MPSR). With 12,780 simulated head impacts, we performed unsupervised domain adaptation on on-field head impacts from 302 college football (CF) impacts and 457 mixed martial arts (MMA) impacts using domain regularized component analysis (DRCA) and cycle-GAN-based methods. The new model improved the MPS/MPSR estimation accuracy, with the DRCA method significantly outperforming other domain adaptation methods in prediction accuracy (p<0.001): MPS RMSE: 0.027 (CF) and 0.037 (MMA); MPSR RMSE: 7.159 (CF) and 13.022 (MMA). On another two hold-out testsets with 195 college football impacts and 260 boxing impacts, the DRCA model significantly outperformed the baseline model without domain adaptation in MPS and MPSR estimation accuracy (p<0.001). The DRCA domain adaptation reduces the MPS/MPSR estimation error to be well below TBI thresholds, enabling accurate brain deformation estimation to detect TBI in future clinical applications.

    View details for PubMedID 37332565

    View details for PubMedCentralID PMC10274939

  • AI-based radiomic biomarkers to predict PD-(L)1 immune checkpoint inhibitor response within PD-L1 high/low/negative expression categories in stage IV NSCLC Simon, G. R., Jordan, P., Sako, C., Beasley, R., Owen, D., Patel, A., Curti, B. D., Weerasinghe, R. K., Lee, S., Amini, A., Liu, A., Page, R. D., Swalduz, A., Beregi, J., Sanchez, S., Gevaert, O., Parikh, R., Aerts, H. LIPPINCOTT WILLIAMS & WILKINS. 2023
  • Multi-center real-world data curation and assessment of tumor growth rate and overall survival in advanced NSCLC treated with PD-(L)1 immune checkpoint inhibitor therapy Sako, C., Jordan, P., McCall, R., Patel, A., Owen, D., Amini, A., Liu, A., Curti, B. D., Weerasinghe, R. K., Lee, S., Page, R. D., Swalduz, A., Beregi, J., Sanchez, S., Gevaert, O., Parikh, R., Simon, G. R., Aerts, H. LIPPINCOTT WILLIAMS & WILKINS. 2023
  • Generative Editing via Convolutional Obscuring (GECO): A Generative Adversarial Network for MRI de-artifacting Bagley, B., Petrov, S., Cheng, G., Armanasu, M., Fischbein, N., Jiang, B., Iv, M., Tranvinh, E., Zeineh, M., Gevaert, O. LIPPINCOTT WILLIAMS & WILKINS. 2023
  • A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer. Scientific data Ding, K., Zhou, M., Wang, H., Gevaert, O., Metaxas, D., Zhang, S. 2023; 10 (1): 231

    Abstract

    The success of training computer-vision models heavily relies on the support of large-scale, real-world images with annotations. Yet such an annotation-ready dataset is difficult to curate in pathology due to the privacy protection and excessive annotation burden. To aid in computational pathology, synthetic data generation, curation, and annotation present a cost-effective means to quickly enable data diversity that is required to boost model performance at different stages. In this study, we introduce a large-scale synthetic pathological image dataset paired with the annotation for nuclei semantic segmentation, termed as Synthetic Nuclei and annOtation Wizard (SNOW). The proposed SNOW is developed via a standardized workflow by applying the off-the-shelf image generator and nuclei annotator. The dataset contains overall 20k image tiles and 1,448,522 annotated nuclei with the CC-BY license. We show that SNOW can be used in both supervised and semi-supervised training scenarios. Extensive results suggest that synthetic-data-trained models are competitive under a variety of model training settings, expanding the scope of better using synthetic images for enhancing downstream data-driven clinical tasks.

    View details for DOI 10.1038/s41597-023-02125-y

    View details for PubMedID 37085533

    View details for PubMedCentralID PMC10121551

  • Machine-learning-based head impact subtyping based on the spectral densities of the measurable head kinematics. Journal of sport and health science Zhan, X., Li, Y., Liu, Y., Cecchi, N. J., Raymond, S. J., Zhou, Z., Alizadeh, H. V., Ruan, J., Barbat, S., Tiernan, S., Gevaert, O., Zeineh, M. M., Grant, G. A., Camarillo, D. B. 2023

    Abstract

    Traumatic brain injury can be caused by head impacts, but many brain injury risk estimation models are not equally accurate across the variety of impacts that patients may undergo, and the characteristics of different types of impacts are not well studied. We investigated the spectral characteristics of different head impact types with kinematics classification.Data was analyzed from 3262 head impacts from lab reconstruction, American football, mixed martial arts, and publicly available car crash data. A random forest classifier with spectral densities of linear acceleration and angular velocity was built to classify head impact types (e.g., football, car crash, mixed martial arts). To test the classifier robustness, another 271 lab-reconstructed impacts were obtained from 5 other instrumented mouthguards. Finally, with the classifier, type-specific, nearest-neighbor regression models were built for brain strain.The classifier reached a median accuracy of 96% over 1000 random partitions of training and test sets. The most important features in the classification included both low-frequency and high-frequency features, both linear acceleration features and angular velocity features. Different head impact types had different distributions of spectral densities in low- and high-frequency ranges (e.g., the spectral densities of MMA impacts were higher in the high-frequency range than in the low-frequency range). The type-specific regression showed a generally higher R2-value than baseline models without classification.The machine-learning-based classifier enables a better understanding of the impact kinematics spectral density in different sports, and it can be applied to evaluate the quality of impact-simulation systems and on-field data augmentation.

    View details for DOI 10.1016/j.jshs.2023.03.003

    View details for PubMedID 36921692

  • Early Detection of Lung Cancer in the NLST Dataset. medRxiv : the preprint server for health sciences Mukherjee, P., Brezhneva, A., Napel, S., Gevaert, O. 2023

    Abstract

    Lung Cancer is the leading cause of cancer mortality in the U.S. The effectiveness of standard treatments, including surgery, chemotherapy or radiotherapy, depends on several factors like type and stage of cancer, with the survival rate being much worse for later cancer stages. The National Lung Screening Trial (NLST) established that patients screened using low-dose Computed Tomography (CT) had a 15 to 20 percent lower risk of dying from lung cancer than patients screened using chest X-rays. While CT excelled at detecting small early stage malignant nodules, a large proportion of patients ( > 25%) screened positive and only a small fraction ( < 10%) of these positive screens actually had or developed cancer in the subsequent years. We developed a model to distinguish between high and low risk patients among the positive screens, predicting the likelihood of having or developing lung cancer at the current time point or in subsequent years non-invasively, based on current and previous CT imaging data. However, most of the nodules in NLST are very small, and nodule segmentations or even precise locations are unavailable. Our model comprises two stages: the first stage is a neural network model trained on the Lung Image Database Consortium (LIDC-IDRI) cohort which detects nodules and assigns them malignancy scores. The second part of our model is a boosted tree which outputs a cancer probability for a patient based on the nodule information (location and malignancy score) predicted by the first stage. Our model, built on a subset of the NLST cohort ( n = 1138) shows excellent performance, achieving an area under the receiver operating characteristics curve (ROC AUC) of 0.85 when predicting based on CT images from all three time points available in the NLST dataset.

    View details for DOI 10.1101/2023.03.01.23286632

    View details for PubMedID 36909593

  • Machine intelligence for radiation science: summary of the Radiation Research Society 67th annual meeting symposium. International journal of radiation biology Wilson, L. J., Kiffer, F. C., Berrios, D. C., Bryce-Atkinson, A., Costes, S. V., Gevaert, O., Matarese, B. F., Miller, J., Mukherjee, P., Peach, K., Schofield, P. N., Slater, L. T., Langen, B. 2023: 1-14

    Abstract

    The era of high-throughput techniques created big data in the medical field and research disciplines. Machine intelligence (MI) approaches can overcome critical limitations on how those large-scale data sets are processed, analyzed, and interpreted. The 67th Annual Meeting of the Radiation Research Society featured a symposium on MI approaches to highlight recent advancements in the radiation sciences and their clinical applications. This article summarizes three of those presentations regarding recent developments for metadata processing and ontological formalization, data mining for radiation outcomes in pediatric oncology, and imaging in lung cancer.

    View details for DOI 10.1080/09553002.2023.2173823

    View details for PubMedID 36735963

  • Topological data analysis of thoracic radiographic images shows improved radiomics-based lung tumor histology prediction. Patterns (New York, N.Y.) Vandaele, R., Mukherjee, P., Selby, H. M., Shah, R. P., Gevaert, O. 2023; 4 (1): 100657

    Abstract

    Topological data analysis provides tools to capture wide-scale structural shape information in data. Its main method, persistent homology, has found successful applications to various machine-learning problems. Despite its recent gain in popularity, much of its potential for medical image analysis remains undiscovered. We explore the prominent learning problems on thoracic radiographic images of lung tumors for which persistent homology improves radiomic-based learning. It turns out that our topological features well capture complementary information important for benign versus malignant and adenocarcinoma versus squamous cell carcinoma tumor prediction while contributing less consistently to small cell versus non-small cell-an interesting result in its own right. Furthermore, while radiomic features are better for predicting malignancy scores assigned by expert radiologists through visual inspection, we find that topological features are better for predicting more accurate histology assessed through long-term radiology review, biopsy, surgical resection, progression, or response.

    View details for DOI 10.1016/j.patter.2022.100657

    View details for PubMedID 36699734

  • EpiMix: an integrative tool for epigenomic subtyping using DNA methylation. bioRxiv : the preprint server for biology Zheng, Y., Jun, J., Brennan, K., Gevaert, O. 2023

    Abstract

    DNA methylation (DNAme) is a major epigenetic factor influencing gene expression with alterations leading to cancer, immunological, and cardiovascular diseases. Recent technological advances enable genome-wide quantification of DNAme in large human cohorts. So far, existing methods have not been evaluated to identify differential DNAme present in large and heterogeneous patient cohorts. We developed an end-to-end analytical framework named "EpiMix" for population-level analysis of DNAme and gene expression. Compared to existing methods, EpiMix showed higher sensitivity in detecting abnormal DNAme that was present in only small patient subsets. We extended the model-based analyses of EpiMix to cis-regulatory elements within protein-coding genes, distal enhancers, and genes encoding microRNAs and lncRNAs. Using cell-type specific data from two separate studies, we discovered novel epigenetic mechanisms underlying childhood food allergy and survival-associated, methylation-driven non-coding RNAs in non-small cell lung cancer.

    View details for DOI 10.1101/2023.01.03.522660

    View details for PubMedID 36711917

  • RNA-to-image multi-cancer synthesis using cascaded diffusion models bioRxiv Carrillo-Perez, F., Pizurica, M., Zheng, Y., Shen, J., Gevaert, O. 2023
  • Development and validation of MedDRA Tagger: a tool for extraction and structuring medical information from clinical notes. medRxiv : the preprint server for health sciences Humbert-Droz, M., Corley, J., Tamang, S., Gevaert, O. 2022

    Abstract

    Rapid and automated extraction of clinical information from patientsa notes is a desirable though difficult task. Natural language processing (NLP) and machine learning have great potential to automate and accelerate such applications, but developing such models can require a large amount of labeled clinical text, which can be a slow and laborious process. To address this gap, we propose the MedDRA tagger, a fast annotation tool that makes use of industrial level libraries such as spaCy, biomedical ontologies and weak supervision to annotate and extract clinical concepts at scale. The tool can be used to annotate clinical text and obtain labels for training machine learning models and further refine the clinical concept extraction performance, or to extract clinical concepts for observational study purposes. To demonstrate the usability and versatility of our tool, we present three different use cases: we use the tagger to determine patients with a primary brain cancer diagnosis, we show evidence of rising mental health symptoms at the population level and our last use case shows the evolution of COVID-19 symptomatology throughout three waves between February 2020 and October 2021. The validation of our tool showed good performance on both specific annotations from our development set (F1 score 0.81) and open source annotated data set (F1 score 0.79). We successfully demonstrate the versatility of our pipeline with three different use cases. Finally, we note that the modular nature of our tool allows for a straightforward adaptation to another biomedical ontology. We also show that our tool is independent of EHR system, and as such generalizable.

    View details for DOI 10.1101/2022.12.14.22283470

    View details for PubMedID 36561189

  • RADIOMICS-BASED MULTI-MODAL PREDICTION OF TREATMENT RESPONSE TO PD-1/PD-L1 IMMUNE CHECKPOINT INHIBITOR (ICI) THERAPY IN STAGE IV NON-SMALL CELL LUNG CARCINOMA (MNSCLC) Parikh, R., Jordan, P., Ciaravino, R., Beasley, R., Patel, A., Owen, D., Amini, A., Curti, B., Page, R., Swalduz, A., Beregi, J., Chrusciel, J., Snyder, E., Mukherjee, P., Selby, H., Lee, S., Weerasinghe, R., Pindikuri, S., Weiss, J., Wentland, A., Kirpalani, A., Liu, A., Gevaert, O., Simon, G., Aerts, H. BMJ PUBLISHING GROUP. 2022: A1346
  • Piecewise Multivariate Linearity Between Kinematic Features and Cumulative Strain Damage Measure (CSDM) Across Different Types of Head Impacts. Annals of biomedical engineering Zhan, X., Li, Y., Liu, Y., Cecchi, N. J., Gevaert, O., Zeineh, M. M., Grant, G. A., Camarillo, D. B. 2022

    Abstract

    In a previous study, we found that the relationship between brain strain and kinematic features cannot be described by a generalized linear model across different types of head impacts. In this study, we investigate if such a linear relationship exists when partitioning head impacts using a data-driven approach. We applied the K-means clustering method to partition 3161 impacts from various sources including simulation, college football, mixed martial arts, and car crashes. We found piecewise multivariate linearity between the cumulative strain damage (CSDM; assessed at the threshold of 0.15) and head kinematic features. Compared with the linear regression models without partition and the partition according to the types of head impacts, K-means-based data-driven partition showed significantly higher CSDM regression accuracy, which suggested the presence of piecewise multivariate linearity across types of head impacts. Additionally, we compared the piecewise linearity with the partitions based on individual features used in clustering. We found that the partition with maximum angular acceleration magnitude at 4706 rad/s2 led to the highest piecewise linearity. This study may contribute to an improved method for the rapid prediction of CSDM in the future.

    View details for DOI 10.1007/s10439-022-03020-0

    View details for PubMedID 35922726

  • Reliably Filter Drug-induced Liver Injury Literature with Natural Language Processing and Conformal Prediction. IEEE journal of biomedical and health informatics Zhan, X., Wang, F., Gevaert, O. 2022; PP

    Abstract

    Drug-induced liver injury describes the adverse effects of drugs that damage liver. Life-threatening results including liver failure or death were also reported in severe cases. Therefore, the events related to liver injury are strictly monitored for all approved drugs and the liver toxicity is an important assessments for new drug candidates. These reports are documented in research papers that contain preliminary in vitro and in vivo experiments. Conventionally, data extraction from previous publications relies heavily on resource-demanding manual labelling, which considerably restricted the efficiency of the information extraction. The development of natural lan- guage processing techniques enables the automatic processing of biomedical texts. Herein, based on around 28,000 papers (titles and abstracts) provided by the Critical Assessment of Massive Data Analysis challenge, this study benchmarked model performances on filtering liver-damage-related literature. Among five text embedding techniques, the model using term frequency- inverse document frequency (TF-IDF) and logistic regression outperformed others with an accuracy of 0.957 on the valida- tion set. Furthermore, an ensemble model with similar overall performances was developed with a logistic regression model on the predicted probability given by separate models with different vectorization techniques. The ensemble model achieved a high accuracy of 0.954 and an F1 score of 0.955 in the hold-out validation data in the challenge. Moreover, important words in positive/negative predictions were identified via model interpretation. The prediction reliability was quantified with conformal prediction, which provides users with a control over the prediction uncertainty. Overall, the ensemble model and TF- IDF model reached satisfactory classification results, which can be used by researchers to rapidly filter literature that describes events related to liver injury induced by medications.

    View details for DOI 10.1109/JBHI.2022.3193365

    View details for PubMedID 35877798

  • Peripheral blood DNA methylation profiles predict future development of B-cell Non-Hodgkin Lymphoma. NPJ precision oncology Espin-Perez, A., Brennan, K., Ediriwickrema, A. S., Gevaert, O., Lossos, I. S., Gentles, A. J. 2022; 6 (1): 53

    Abstract

    Lack of accurate methods for early lymphoma detection limits the ability to cure patients. Since patients with Non-Hodgkin lymphomas (NHL) who present with advanced disease have worse outcomes, accurate and sensitive methods for early detection are needed to improve patient care. We developed a DNA methylation-based prediction tool for NHL, based on blood samples collected prospectively from 278 apparently healthy patients who were followed for up to 16 years to monitor for NHL development. A predictive score was developed using machine learning methods in a robust training/validation framework. Our predictive score incorporates CpG DNA methylation at 135 genomic positions, with higher scores predicting higher risk. It was 85% and 78% accurate for identifying patients at risk of developing future NHL, in patients with high or low epigenetic mitotic clock respectively, in a validation cohort. It was also sensitive at detecting active NHL (96.3% accuracy) and healthy status (95.6% accuracy) in additional independent cohorts. Scores optimized for specific NHL subtypes showed significant but lower accuracy for predicting other subtypes. Our score incorporates hyper-methylation of Polycomb and HOX genes, which have roles in NHL development, as well as PAX5 - a master transcriptional regulator of B-cell fate. Subjects with higher risk scores showed higher regulatory T-cells, memory B-cells, but lower naive T helper lymphocytes fractions in the blood. Future prospective studies will be required to confirm the utility of our signature for managing patients who are at high risk for developing future NHL.

    View details for DOI 10.1038/s41698-022-00295-3

    View details for PubMedID 35864305

  • Preparing for the next pandemic via transfer learning from existing diseases with hierarchical multi-modal BERT: a study on COVID-19 outcome prediction. Scientific reports Agarwal, K., Choudhury, S., Tipirneni, S., Mukherjee, P., Ham, C., Tamang, S., Baker, M., Tang, S., Kocaman, V., Gevaert, O., Rallo, R., Reddy, C. K. 2022; 12 (1): 10748

    Abstract

    Developing prediction models for emerging infectious diseases from relatively small numbers of cases is a critical need for improving pandemic preparedness. Using COVID-19 as an exemplar, we propose a transfer learning methodology for developing predictive models from multi-modal electronic healthcare records by leveraging information from more prevalent diseases with shared clinical characteristics. Our novel hierarchical, multi-modal model ([Formula: see text]) integrates baseline risk factors from the natural language processing of clinical notes at admission, time-series measurements of biomarkers obtained from laboratory tests, and discrete diagnostic, procedure and drug codes. We demonstrate the alignment of [Formula: see text]'s predictions with well-established clinical knowledge about COVID-19 through univariate and multivariate risk factor driven sub-cohort analysis. [Formula: see text]'s superior performance over state-of-the-art methods shows that leveraging patient data across modalities and transferring prior knowledge from similar disorders is critical for accurate prediction of patient outcomes, and this approach may serve as an important tool in the early response to future pandemics.

    View details for DOI 10.1038/s41598-022-13072-w

    View details for PubMedID 35750878

  • ImaGene: A robust AI-based software platform for tumor radiogenomic evaluation and reporting Sukhadia, S. S., Tyagi, A., Venkatraman, V., Mukherjee, P., Prathosh, A. P., Divate, M., Gevaert, O., Nagaraj, S. H. AMER ASSOC CANCER RESEARCH. 2022
  • ImaGene: A robust AI-based software platform for tumor radiogenomic evaluation and reporting. Sukhadia, S. S., Tyagi, A., Venkatraman, V., Mukherjee, P., Prathosh, A. P., Divate, M., Gevaert, O., Nagaraj, S. H. AMER ASSOC CANCER RESEARCH. 2022
  • A REAL-TIME SYSTEM TO MONITOR BRAIN STRAIN TO DETECT DANGEROUS HEAD IMPACTS Zhan, X., Liu, Y., Gevaert, O., Zeineh, M., Camarillo, D. MARY ANN LIEBERT, INC. 2022: A22
  • Machine-Learning-Based Late Fusion on Multi-Omics and Multi-Scale Data for Non-Small-Cell Lung Cancer Diagnosis. Journal of personalized medicine Carrillo-Perez, F., Morales, J. C., Castillo-Secilla, D., Gevaert, O., Rojas, I., Herrera, L. J. 2022; 12 (4)

    Abstract

    Differentiation between the various non-small-cell lung cancer subtypes is crucial for providing an effective treatment to the patient. For this purpose, machine learning techniques have been used in recent years over the available biological data from patients. However, in most cases this problem has been treated using a single-modality approach, not exploring the potential of the multi-scale and multi-omic nature of cancer data for the classification. In this work, we study the fusion of five multi-scale and multi-omic modalities (RNA-Seq, miRNA-Seq, whole-slide imaging, copy number variation, and DNA methylation) by using a late fusion strategy and machine learning techniques. We train an independent machine learning model for each modality and we explore the interactions and gains that can be obtained by fusing their outputs in an increasing manner, by using a novel optimization approach to compute the parameters of the late fusion. The final classification model, using all modalities, obtains an F1 score of 96.81±1.07, an AUC of 0.993±0.004, and an AUPRC of 0.980±0.016, improving those results that each independent model obtains and those presented in the literature for this problem. These obtained results show that leveraging the multi-scale and multi-omic nature of cancer data can enhance the performance of single-modality clinical decision support systems in personalized medicine, consequently improving the diagnosis of the patient.

    View details for DOI 10.3390/jpm12040601

    View details for PubMedID 35455716

  • Find the spatial co-variation of brain deformation with principal component analysis. IEEE transactions on bio-medical engineering Zhan, X., Liu, Y., Cecchi, N. J., Gevaert, O., Zeineh, M., Grant, G., Camarillo, D. B. 2022; PP

    Abstract

    Strain and strain rate are effective traumatic brain injury metrics. In finite element (FE) head model, thousands of elements were used to represent the spatial distribution of these metrics. Owing that these metrics are resulted from brain inertia, their spatial distribution can be represented in more concise pattern. Since head kinematic features and brain deformation vary largely across head impact types, we apply principal component analysis (PCA) to find the spatial co-variation of injury metrics (maximum principal strain (MPS), MPS rate (MPSR) and MPS ×MPSR) in four impact types: simulation, football, mixed martial arts and car crashes, and use the PCA to find patterns in these metrics and improve the machine learning head model (MLHM).We applied PCA to decompose the injury metrics for all impacts in each impact type, and investigate the spatial co-variation using the first principal component (PC1). Furthermore, we developed a MLHM to predict PC1 and then inverse-transform to predict for all brain elements. The accuracy, the model complexity and the size of training dataset of PCA-MLHM are compared with previous MLHM.PC1 explained >80% variance on the datasets. Based on PC1 coefficients, the corpus callosum and midbrain exhibit high variance on all datasets. Finally, the PCA-MLHM reduced model parameters by 74% with a similar MPS estimation accuracy.The brain injury metric in a dataset can be decomposed into mean components and PC1 with high explained variance.The spatial co-variation analysis enables better interpretation of the patterns in brain injury metrics. It also improves the efficiency of MLHM.

    View details for DOI 10.1109/TBME.2022.3163230

    View details for PubMedID 35349430

  • NSD1 mutations deregulate transcription and DNA methylation of bivalent developmental genes in Sotos syndrome. Human molecular genetics Brennan, K., Zheng, H., Fahrner, J. A., Shin, J. H., Gentles, A. J., Schaefer, B., Sunwoo, J. B., Bernstein, J. A., Gevaert, O. 2022

    Abstract

    Sotos syndrome (SS), the most common overgrowth with intellectual disability (OGID) disorder, is caused by inactivating germline mutations of NSD1, which encodes a histone H3 lysine 36 methyltransferase. To understand how NSD1 inactivation deregulates transcription and DNA methylation (DNAm), and to explore how these abnormalities affect human development, we profiled transcription and DNAm in SS patients and healthy control individuals. We identified a transcriptional signature that distinguishes individuals with SS from controls and was also deregulated in NSD1 mutated cancers. Most abnormally expressed genes displayed reduced expression in SS; these downregulated genes consisted mostly of bivalent genes and were enriched for regulators of development and neural synapse function. DNA hypomethylation was strongly enriched within promoters of transcriptionally deregulated genes: Overexpressed genes displayed hypomethylation at their transcription start sites (TSSs) while underexpressed genes featured hypomethylation at polycomb binding sites within their promoter CpG island shores. SS patients featured accelerated molecular aging at the levels of both transcription and DNAm. Overall, these findings indicate that NSD1-deposited H3K36 methylation regulates transcription by directing promoter DNA methylation, partially by repressing polycomb repressive complex 2 (PRC2) activity. These findings could explain the phenotypic similarity of SS to OGID disorders that are caused by mutations in PRC2 complex-encoding genes.

    View details for DOI 10.1093/hmg/ddac026

    View details for PubMedID 35094088

  • ImaGene: a web-based software platform for tumor radiogenomic evaluation and reporting. Bioinformatics advances Sukhadia, S. S., Tyagi, A., Venkataraman, V., Mukherjee, P., Prasad, P., Gevaert, O., Nagaraj, S. H. 2022; 2 (1): vbac079

    Abstract

    Summary: Radiographic imaging techniques provide insight into the imaging features of tumor regions of interest, while immunohistochemistry and sequencing techniques performed on biopsy samples yield omics data. Relationships between tumor genotype and phenotype can be identified from these data through traditional correlation analyses and artificial intelligence (AI) models. However, the radiogenomics community lacks a unified software platform with which to conduct such analyses in a reproducible manner. To address this gap, we developed ImaGene, a web-based platform that takes tumor omics and imaging datasets as inputs, performs correlation analysis between them, and constructs AI models. ImaGene has several modifiable configuration parameters and produces a report displaying model diagnostics. To demonstrate the utility of ImaGene, we utilized data for invasive breast carcinoma (IBC) and head and neck squamous cell carcinoma (HNSCC) and identified potential associations between imaging features and nine genes (WT1, LGI3, SP7, DSG1, ORM1, CLDN10, CST1, SMTNL2, and SLC22A31) for IBC and eight genes (NR0B1, PLA2G2A, MAL, CLDN16, PRDM14, VRTN, LRRN1, and MECOM) for HNSCC. ImaGene has the potential to become a standard platform for radiogenomic tumor analyses due to its ease of use, flexibility, and reproducibility, playing a central role in the establishment of an emerging radiogenomic knowledge base.Availability and implementation: www.ImaGene.pgxguide.org, https://github.com/skr1/Imagene.git.Supplementary information: Supplementary data are available at https://github.com/skr1/Imagene.git.

    View details for DOI 10.1093/bioadv/vbac079

    View details for PubMedID 36699376

  • Tumor response as defined by iRECIST in gastrointestinal malignancies treated with PD-1 and PD-L1 inhibitors and correlation with survival. BMC cancer Xie, P., Zheng, H., Chen, H., Wei, K., Pan, X., Xu, Q., Wang, Y., Tang, C., Gevaert, O., Meng, X. 2021; 21 (1): 1246

    Abstract

    BACKGROUND: Atypical tumor response patterns during immune checkpoint inhibitor therapy pose a challenge to clinicians and investigators in immuno-oncology practice. This study evaluated tumor burden dynamics to identify imaging biomarkers for treatment response and overall survival (OS) in advanced gastrointestinal malignancies treated with PD-1/PD-L1 inhibitors.METHODS: This retrospective study enrolled a total of 198 target lesions in 75 patients with advanced gastrointestinal malignancies treated with PD-1/PD-L1 inhibitors between January 2017 and March 2021. Tumor diameter changes as defined by immunotherapy Response Evaluation Criteria in Solid Tumors (iRECIST) were studied to determine treatment response and association with OS.RESULTS: Based on the best overall response, the tumor diameter ranged from -100 to +135.3% (median: -9.6%). The overall response rate was 32.0% (24/75), and the rate of durable disease control for at least 6months was 30.7% (23/75, one (iCR, immune complete response) or 20 iPR (immune partial response), or 2iSD (immune stable disease). Using univariate analysis, patients with a tumor diameter maintaining a<20% increase (48/75, 64.0%) from baseline had longer OS than those with ≥20% increase (27/75, 36.0%) and, a reduced risk of death (median OS: 80months vs. 48months, HR=0.22, P=0.034). The differences in age (HR=1.09, P=0.01), combined surgery (HR=0.15, P=0.01) and cancer type (HR=0.23, P=0.001) were significant. In multivariable analysis, patients with a tumor diameter with a<20% increase had notably reduced hazards of death (HR=0.15, P=0.01) after adjusting for age, combined surgery, KRAS status, cancer type, mismatch repair (MMR) status, treatment course and cancer differentiation. Two patients (2.7%) showed pseudoprogression.CONCLUSIONS: Tumor diameter with a<20% increase from baseline during therapy in gastrointestinal malignancies was associated with therapeutic benefit and longer OS and may serve as a practical imaging marker for treatment response, clinical outcome and treatment decision making.

    View details for DOI 10.1186/s12885-021-08944-9

    View details for PubMedID 34798858

  • Precision of MRI radiomics features in the liver and hepatocellular carcinoma. European radiology Carbonell, G., Kennedy, P., Bane, O., Kirmani, A., El Homsi, M., Stocker, D., Said, D., Mukherjee, P., Gevaert, O., Lewis, S., Hectors, S., Taouli, B. 2021

    Abstract

    OBJECTIVES: To assess the precision of MRI radiomics features in hepatocellular carcinoma (HCC) tumors and liver parenchyma.METHODS: The study population consisted of 55 patients, including 16 with untreated HCCs, who underwent two repeat contrast-enhanced abdominal MRI exams within 1month to evaluate: (1) test-retest repeatability using the same MRI system (n=28, 10 HCCs); (2) inter-platform reproducibility between different MRI systems (n=27, 6 HCCs); (3) inter-observer reproducibility (n=16, 16 HCCs). Shape and 1st- and 2nd-order radiomics features were quantified on pre-contrast T1-weighted imaging (WI), T1WI portal venous phase (pvp), T2WI, and ADC (apparent diffusion coefficient), on liver regions of interest (ROIs) and HCC volumes of interest (VOIs). Precision was assessed by calculating intraclass correlation coefficient (ICC), concordance correlation coefficient (CCC), and coefficient of variation (CV).RESULTS: There was moderate to excellent test-retest repeatability of shape and 1st- and 2nd-order features for all sequences in HCCs (ICC: 0.53-0.99; CV: 3-29%), and moderate to good test-retest repeatability of 1st- and 2nd-order features for T1WI sequences, and 2nd-order features for T2WI in the liver (ICC: 0.53-0.73; CV: 12-19%). There was poorinter-platform reproducibility for all features and sequences, except for shape and 1st-order features on T1WI in HCCs (CCC: 0.58-0.99; CV: 3-15%). Good to excellent inter-observer reproducibility was found for all features and sequences in HCCs (CCC: 0.80-0.99; CV: 4-15%) and moderate to good for liver (CCC: 0.45-0.86; CV: 6-25%).CONCLUSIONS: MRI radiomics features have acceptable repeatability in the liver and HCC when using the same MRI system and across readers but have low reproducibility across MR systems, except for shape and 1st-order features on T1WI. Data must be interpreted with caution when performing multiplatform radiomics studies.KEY POINTS: MRI radiomics features have acceptable repeatability when using the same MRI system but less reproducible when using different MRI platforms. MRI radiomics features extracted from T1 weighted-imaging show greater stability across exams than T2 weighted-imaging and ADC. Inter-observer reproducibility of MRI radiomics features was found to be good in HCC tumors and acceptable in liver parenchyma.

    View details for DOI 10.1007/s00330-021-08282-1

    View details for PubMedID 34564745

  • Predictive Factors of Kinematics in Traumatic Brain Injury from Head Impacts Based on Statistical Interpretation. Annals of biomedical engineering Zhan, X., Li, Y., Liu, Y., Domel, A. G., Alizadeh, H. V., Zhou, Z., Cecchi, N. J., Raymond, S. J., Tiernan, S., Ruan, J., Barbat, S., Gevaert, O., Zeineh, M. M., Grant, G. A., Camarillo, D. B. 2021

    Abstract

    Brain tissue deformation resulting from head impacts is primarily caused by rotation and can lead to traumatic brain injury. To quantify brain injury risk based on measurements of kinematics on the head, finite element (FE) models and various brain injury criteria based on different factors of these kinematics have been developed, but the contribution of different kinematic factors has not been comprehensively analyzed across different types of head impacts in a data-driven manner. To better design brain injury criteria, the predictive power of rotational kinematics factors, which are different in (1) the derivative order (angular velocity, angular acceleration, angular jerk), (2) the direction and (3) the power (e.g., square-rooted, squared, cubic) of the angular velocity, were analyzed based on different datasets including laboratory impacts, American football, mixed martial arts (MMA), NHTSA automobile crashworthiness tests and NASCAR crash events. Ordinary least squares regressions were built from kinematics factors to the 95% maximum principal strain (MPS95), and we compared zero-order correlation coefficients, structure coefficients, commonality analysis, and dominance analysis. The angular acceleration, the magnitude and the first power factors showed the highest predictive power for the majority of impacts including laboratory impacts, American football impacts, with few exceptions (angular velocity for MMA and NASCAR impacts). The predictive power of rotational kinematics about three directions (x: posterior-to-anterior, y: left-to-right, z: superior-to-inferior) of kinematics varied with different sports and types of head impacts.

    View details for DOI 10.1007/s10439-021-02813-z

    View details for PubMedID 34244908

  • Machine Learning Radiomics Model for Early Identification of Small-Cell Lung Cancer on Computed Tomography Scans. JCO clinical cancer informatics Shah, R. P., Selby, H. M., Mukherjee, P., Verma, S., Xie, P., Xu, Q., Das, M., Malik, S., Gevaert, O., Napel, S. 2021; 5: 746-757

    Abstract

    PURPOSE: Small-cell lung cancer (SCLC) is the deadliest form of lung cancer, partly because of its short doubling time. Delays in imaging identification and diagnosis of nodules create a risk for stage migration. The purpose of our study was to determine if a machine learning radiomics model can detect SCLC on computed tomography (CT) among all nodules at least 1 cm in size.MATERIALS AND METHODS: Computed tomography scans from a single institution were selected and resampled to 1 * 1 * 1 mm. Studies were divided into SCLC and other scans comprising benign, adenocarcinoma, and squamous cell carcinoma that were segregated into group A (noncontrast scans) and group B (contrast-enhanced scans). Four machine learning classification models, support vector classifier, random forest (RF), XGBoost, and logistic regression, were used to generate radiomic models using 59 quantitative first-order and texture Imaging Biomarker Standardization Initiative compliant PyRadiomics features, which were found to be robust between two segmenters with minimum Redundancy Maximum Relevance feature selection within each leave-one-out-cross-validation to avoid overfitting. The performance was evaluated using a receiver operating characteristic curve. A final model was created using the RF classifier and aggregate minimum Redundancy Maximum Relevance to determine feature importance.RESULTS: A total of 103 studies were included in the analysis. The area under the receiver operating characteristic curve for RF, support vector classifier, XGBoost, and logistic regression was 0.81, 0.77, 0.84, and 0.84 in group A, and 0.88, 0.87, 0.85, and 0.81 in group B, respectively. Nine radiomic features in group A and 14 radiomic features in group B were predictive of SCLC. Six radiomic features overlapped between groups A and B.CONCLUSION: A machine learning radiomics model may help differentiate SCLC from other lung lesions.

    View details for DOI 10.1200/CCI.21.00021

    View details for PubMedID 34264747

  • Meta-learning reduces the amount of data needed to build AI models in oncology. British journal of cancer Gevaert, O. 2021

    Abstract

    Meta-learning is showing promise in recent genomic studies in oncology. Meta-learning can facilitate transfer learning and reduce the amount of data that is needed in a target domain by transferring knowledge from abundant genomic data in different source domains enabling the use of AI in data scarce scenarios.

    View details for DOI 10.1038/s41416-021-01358-1

    View details for PubMedID 33782563

  • An expanded universe of cancer targets. Cell Hahn, W. C., Bader, J. S., Braun, T. P., Califano, A., Clemons, P. A., Druker, B. J., Ewald, A. J., Fu, H., Jagu, S., Kemp, C. J., Kim, W., Kuo, C. J., McManus, M., B Mills, G., Mo, X., Sahni, N., Schreiber, S. L., Talamas, J. A., Tamayo, P., Tyner, J. W., Wagner, B. K., Weiss, W. A., Gerhard, D. S., Cancer Target Discovery and Development Network, Dancik, V., Gill, S., Hua, B., Sharifnia, T., Viswanathan, V., Zou, Y., Dela Cruz, F., Kung, A., Stockwell, B., Boehm, J., Dempster, J., Manguso, R., Vazquez, F., Cooper, L. A., Du, Y., Ivanov, A., Lonial, S., Moreno, C. S., Niu, Q., Owonikoko, T., Ramalingam, S., Reyna, M., Zhou, W., Grandori, C., Shmulevich, I., Swisher, E., Cai, J., Chan, I. S., Dunworth, M., Ge, Y., Georgess, D., Grasset, E. M., Henriet, E., Knutsdottir, H., Lerner, M. G., Padmanaban, V., Perrone, M. C., Suhail, Y., Tsehay, Y., Warrier, M., Morrow, Q., Nechiporuk, T., Long, N., Saultz, J., Kaempf, A., Minnier, J., Tognon, C. E., Kurtz, S. E., Agarwal, A., Brown, J., Watanabe-Smith, K., Vu, T. Q., Jacob, T., Yan, Y., Robinson, B., Lind, E. F., Kosaka, Y., Demir, E., Estabrook, J., Grzadkowski, M., Nikolova, O., Chen, K., Deneen, B., Liang, H., Bassik, M. C., Bhattacharya, A., Brennan, K., Curtis, C., Gevaert, O., Ji, H. P., Karlsson, K. A., Karagyozova, K., Lo, Y., Liu, K., Nakano, M., Sathe, A., Smith, A. R., Spees, K., Wong, W. H., Yuki, K., Hangauer, M., Kaufman, D. S., Balmain, A., Bollam, S. R., Chen, W., Fan, Q., Kersten, K., Krummel, M., Li, Y. R., Menard, M., Nasholm, N., Schmidt, C., Serwas, N. K., Yoda, H. 2021; 184 (5): 1142–55

    Abstract

    The characterization of cancer genomes has provided insight into somatically altered genes across tumors, transformed our understanding of cancer biology, and enabled tailoring of therapeutic strategies. However, the function of most cancer alleles remains mysterious, and many cancer features transcend their genomes. Consequently, tumor genomic characterization does not influence therapy for most patients. Approaches to understand the function and circuitry of cancer genes provide complementary approaches to elucidate both oncogene and non-oncogene dependencies. Emerging work indicates that the diversity of therapeutic targets engendered by non-oncogene dependencies is much larger than the list of recurrently mutated genes. Here we describe a framework for this expanded list of cancer targets, providing novel opportunities for clinical translation.

    View details for DOI 10.1016/j.cell.2021.02.020

    View details for PubMedID 33667368

  • Genetic mutation and biological pathway prediction based on whole slide images in breast carcinoma using deep learning. NPJ precision oncology Qu, H., Zhou, M., Yan, Z., Wang, H., Rustgi, V. K., Zhang, S., Gevaert, O., Metaxas, D. N. 2021; 5 (1): 87

    Abstract

    Breast carcinoma is the most common cancer among women worldwide that consists of a heterogeneous group of subtype diseases. The whole-slide images (WSIs) can capture the cell-level heterogeneity, and are routinely used for cancer diagnosis by pathologists. However, key driver genetic mutations related to targeted therapies are identified by genomic analysis like high-throughput molecular profiling. In this study, we develop a deep-learning model to predict the genetic mutations and biological pathway activities directly from WSIs. Our study offers unique insights into WSI visual interactions between mutation and its related pathway, enabling a head-to-head comparison to reinforce our major findings. Using the histopathology images from the Genomic Data Commons Database, our model can predict the point mutations of six important genes (AUC 0.68-0.85) and copy number alteration of another six genes (AUC 0.69-0.79). Additionally, the trained models can predict the activities of three out of ten canonical pathways (AUC 0.65-0.79). Next, we visualized the weight maps of tumor tiles in WSI to understand the decision-making process of deep-learning models via a self-attention mechanism. We further validated our models on liver and lung cancers that are related to metastatic breast cancer. Our results provide insights into the association between pathological image features, molecular outcomes, and targeted therapies for breast cancer patients.

    View details for DOI 10.1038/s41698-021-00225-9

    View details for PubMedID 34556802

  • The relationship between brain injury criteria and brain strain across different types of head impacts can be different Journal of Royal Society Interface Zhan, X., Li, Y., Liu, Y., Domel, A. G., Vahid Alidazeh, H., Raymond, S. J., Ruan, J., Barbat, S., Tienan, S., Gevaert, O., Zeineh, M., Grant, G., Camarillo, D. 2021; 18 (20210260)

    View details for DOI 10.1098/rsif.2021.0260

  • Rapid Estimation of Entire Brain Strain Using Deep Learning Models IEEE Transactions on Biomedical Engineering Zhan, X., Liu, Y., Raymond, S. J., Vahid Alizadeh, H., Domel, A. G., Gevaert, O., Zeineh, M. M., Grant, G. A., Camarillo, D. 2021: 11

    Abstract

    Many recent studies suggest that brain deformation resulting from head impacts are linked to the corresponding clinical outcome, such as mild traumatic brain injury (mTBI). Even if several finite element (FE) head models have been developed and validated to calculate brain deformation based on impact kinematics, the clinical application of these FE head models is limited due to the time-consuming nature of FE simulations. This work aims to accelerate the brain deformation calculation and thus improve the potential for clinical applications.We propose a deep learning head model with a five-layer deep neural network and feature engineering, and trained and tested the model on 2511 head impacts from a combination of head model simulations and on-field college football and mixed martial arts impacts.The proposed deep learning head model can calculate the maximum principal strain (Green Lagrange) for every element in the entire brain in less than 0.001 s with an average root mean squared error of 0.022 and a standard deviation of 0.001 over twenty repeats with random data partition and model initialization.Trained and tested using the dataset of 2511 head impacts, this model can be applied to various sports in the calculation of brain strain with accuracy, and its applicability can even further be extended by incorporating data from other types of head impacts.In addition to the potential clinical application in real-time brain deformation monitoring, this model will help researchers estimate the brain strain from a large number of head impacts more efficiently than using FE models.

    View details for DOI 10.1109/TBME.2021.3073380

  • Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases Patterns Zhan, X., Humbert-Droz, M., Mukherjee, P., Gevaert, O. 2021: 100289
  • INTEGRATED MULTI-SCALE MODEL FOR PEDIATRIC BRAIN TUMOR SURVIVAL PREDICTION Qiu, Y., Sabran, A., Zheng, H., Gevaert, O. OXFORD UNIV PRESS INC. 2020: 440–41
  • Community Assessment of the Predictability of Cancer Protein and Phosphoprotein Levels from Genomics and Transcriptomics CELL SYSTEMS Yang, M., Petralia, F., Li, Z., Li, H., Ma, W., Song, X., Kim, S., Lee, H., Yu, H., Lee, B., Bae, S., Heo, E., Kaczmarczyk, J., Stepniak, P., Warchol, M., Yu, T., Calinawan, A. P., Boutros, P. C., Payne, S. H., Reva, B., Boja, E., Rodriguez, H., Stolovitzky, G., Guan, Y., Kang, J., Wang, P., Fenyo, D., Saez-Rodriguez, J., NCI-CPTAC-DREAM Consortium 2020; 11 (2): 186-+

    Abstract

    Cancer is driven by genomic alterations, but the processes causing this disease are largely performed by proteins. However, proteins are harder and more expensive to measure than genes and transcripts. To catalyze developments of methods to infer protein levels from other omics measurements, we leveraged crowdsourcing via the NCI-CPTAC DREAM proteogenomic challenge. We asked for methods to predict protein and phosphorylation levels from genomic and transcriptomic data in cancer patients. The best performance was achieved by an ensemble of models, including as predictors transcript level of the corresponding genes, interaction between genes, conservation across tumor types, and phosphosite proximity for phosphorylation prediction. Proteins from metabolic pathways and complexes were the best and worst predicted, respectively. The performance of even the best-performing model was modest, suggesting that many proteins are strongly regulated through translational control and degradation. Our results set a reference for the limitations of computational inference in proteogenomics. A record of this paper's transparent peer review process is included in the Supplemental Information.

    View details for DOI 10.1016/j.cels.2020.06.013

    View details for Web of Science ID 000563112000007

    View details for PubMedID 32710834

  • Interreader Variability in Semantic Annotation of Microvascular Invasion in Hepatocellular Carcinoma on Contrast-enhanced Triphasic CT Images. Radiology. Imaging cancer Bakr, S., Gevaert, O., Patel, B., Kesselman, A., Shah, R., Napel, S., Kothary, N. 2020; 2 (3): e190062

    Abstract

    Purpose: To evaluate interreader agreement in annotating semantic features on preoperative CT images to predict microvascular invasion (MVI) in patients with hepatocellular carcinoma (HCC).Materials and Methods: Preoperative, contrast material-enhanced triphasic CT studies from 89 patients (median age, 64 years; age range, 36-85 years; 70 men) who underwent hepatic resection between 2008 and 2017 for a solitary HCC were reviewed. Three radiologists annotated CT images obtained during the arterial and portal venous phases, independently and in consensus, with features associated with MVI reported by other investigators. The assessed factors were the presence or absence of discrete internal arteries, hypoattenuating halo, tumor-liver difference, peritumoral enhancement, and tumor margin. Testing also included previously proposed MVI signatures: radiogenomic venous invasion (RVI) and two-trait predictor of venous invasion (TTPVI), using single-reader and consensus annotations. Cohen (two-reader) and Fleiss (three-reader) kappa and the bootstrap method were used to analyze interreader agreement and differences in model performance, respectively.Results: Of HCCs assessed, 32.6% (29 of 89) had MVI at histopathologic findings. Two-reader agreement, as assessed by pairwise Cohen kappa statistics, varied as a function of feature and imaging phase, ranging from 0.02 to 0.6; three-reader Fleiss kappa varied from -0.17 to 0.56. For RVI and TTPVI, the best single-reader performance had sensitivity and specificity of 52% and 77% and 67% and 74%, respectively. In consensus, the sensitivity and specificity for the RVI and TTPVI signatures were 59% and 67% and 70% and 62%, respectively.Conclusion: Interreader variability in semantic feature annotation remains a challenge and affects the reproducibility of predictive models for preoperative detection of MVI in HCC.Supplemental material is available for this article.© RSNA, 2020.

    View details for DOI 10.1148/rycan.2020190062

    View details for PubMedID 32550600

  • A CT-based radiomics nomogram for prediction of human epidermal growth factor receptor 2 status in patients with gastric cancer CHINESE JOURNAL OF CANCER RESEARCH Li, Y., Cheng, Z., Gevaert, O., He, L., Huang, Y., Chen, X., Huang, X., Wu, X., Zhang, W., Dong, M., Huang, J., Huang, Y., Xia, T., Liang, C., Liu, Z. 2020; 32 (1): 62-+
  • A CT-based radiomics nomogram for prediction of human epidermal growth factor receptor 2 status in patients with gastric cancer. Chinese journal of cancer research = Chung-kuo yen cheng yen chiu Li, Y., Cheng, Z., Gevaert, O., He, L., Huang, Y., Chen, X., Huang, X., Wu, X., Zhang, W., Dong, M., Huang, J., Huang, Y., Xia, T., Liang, C., Liu, Z. 2020; 32 (1): 62-71

    Abstract

    To develop and validate a computed tomography (CT)-based radiomics nomogram for predicting human epidermal growth factor receptor 2 (HER2) status in patients with gastric cancer.This retrospective study included 134 patients with gastric cancer (HER2-negative: n=87; HER2-positive: n=47) from April 2013 to March 2018, who were then randomly divided into training (n=94) and validation (n=40) cohorts. Radiomics features were obtained from the CT images showing gastric cancer. Least absolute shrinkage and selection operator (LASSO) regression analysis was utilized for building the radiomics signature. A multivariable logistic regression method was applied to develop a prediction model incorporating the radiomics signature and independent clinicopathologic risk predictors, which were then visualized as a radiomics nomogram. The predictive performance of the nomogram was assessed in the training and validation cohorts.The radiomics signature was significantly associated with HER2 status in both training (P<0.001) and validation (P=0.023) cohorts. The prediction model that incorporated the radiomics signature and carcinoembryonic antigen (CEA) level demonstrated good discriminative performance for HER2 status prediction, with an area under the curve (AUC) of 0.799 [95% confidence interval (95% CI): 0.704-0.894] in the training cohort and 0.771 (95% CI: 0.607-0.934) in the validation cohort. The calibration curve of the radiomics nomogram also showed good calibration. Decision curve analysis showed that the radiomics nomogram was useful.We built and validated a radiomics nomogram with good performance for HER2 status prediction in gastric cancer. This radiomics nomogram could serve as a non-invasive tool to predict HER2 status and guide clinical treatment.

    View details for DOI 10.21147/j.issn.1000-9604.2020.01.08

    View details for PubMedID 32194306

    View details for PubMedCentralID PMC7072015

  • Topological image modification for object detection and topological image processing of skin lesions. Scientific reports Vandaele, R. n., Nervo, G. A., Gevaert, O. n. 2020; 10 (1): 21061

    Abstract

    We propose a new method based on Topological Data Analysis (TDA) consisting of Topological Image Modification (TIM) and Topological Image Processing (TIP) for object detection. Through this newly introduced method, we artificially destruct irrelevant objects, and construct new objects with known topological properties in irrelevant regions of an image. This ensures that we are able to identify the important objects in relevant regions of the image. We do this by means of persistent homology, which allows us to simultaneously select appropriate thresholds, as well as the objects corresponding to these thresholds, and separate them from the noisy background of an image. This leads to a new image, processed in a completely unsupervised manner, from which one may more efficiently extract important objects. We demonstrate the usefulness of this proposed method for topological image processing through a case-study of unsupervised segmentation of the ISIC 2018 skin lesion images. Code for this project is available on https://bitbucket.org/ghentdatascience/topimgprocess .

    View details for DOI 10.1038/s41598-020-77933-y

    View details for PubMedID 33273628

  • Segmentation and Classification in Digital Pathology for Glioma Research: Challenges and Deep Learning Approaches. Frontiers in neuroscience Kurc, T., Bakas, S., Ren, X., Bagari, A., Momeni, A., Huang, Y., Zhang, L., Kumar, A., Thibault, M., Qi, Q., Wang, Q., Kori, A., Gevaert, O., Zhang, Y., Shen, D., Khened, M., Ding, X., Krishnamurthi, G., Kalpathy-Cramer, J., Davis, J., Zhao, T., Gupta, R., Saltz, J., Farahani, K. 2020; 14: 27

    Abstract

    Biomedical imaging Is an important source of information in cancer research. Characterizations of cancer morphology at onset, progression, and in response to treatment provide complementary information to that gleaned from genomics and clinical data. Accurate extraction and classification of both visual and latent image features Is an increasingly complex challenge due to the increased complexity and resolution of biomedical image data. In this paper, we present four deep learning-based image analysis methods from the Computational Precision Medicine (CPM) satellite event of the 21st International Medical Image Computing and Computer Assisted Intervention (MICCAI 2018) conference. One method Is a segmentation method designed to segment nuclei in whole slide tissue images (WSIs) of adult diffuse glioma cases. It achieved a Dice similarity coefficient of 0.868 with the CPM challenge datasets. Three methods are classification methods developed to categorize adult diffuse glioma cases into oligodendroglioma and astrocytoma classes using radiographic and histologic image data. These methods achieved accuracy values of 0.75, 0.80, and 0.90, measured as the ratio of the number of correct classifications to the number of total cases, with the challenge datasets. The evaluations of the four methods indicate that (1) carefully constructed deep learning algorithms are able to produce high accuracy in the analysis of biomedical image data and (2) the combination of radiographic with histologic image information improves classification performance.

    View details for DOI 10.3389/fnins.2020.00027

    View details for PubMedID 32153349

  • Whole slide images reflect DNA methylation patterns of human tumors. NPJ genomic medicine Zheng, H. n., Momeni, A. n., Cedoz, P. L., Vogel, H. n., Gevaert, O. n. 2020; 5 (1): 11

    Abstract

    DNA methylation is an important epigenetic mechanism regulating gene expression and its role in carcinogenesis has been extensively studied. High-throughput DNA methylation assays have been used broadly in cancer research. Histopathology images are commonly obtained in cancer treatment, given that tissue sampling remains the clinical gold-standard for diagnosis. In this work, we investigate the interaction between cancer histopathology images and DNA methylation profiles to provide a better understanding of tumor pathobiology at the epigenetic level. We demonstrate that classical machine learning algorithms can associate the DNA methylation profiles of cancer samples with morphometric features extracted from whole slide images. Furthermore, grouping the genes into methylation clusters greatly improves the performance of the models. The well-predicted genes are enriched in key pathways in carcinogenesis including hypoxia in glioma and angiogenesis in renal cell carcinoma. Our results provide new insights into the link between histopathological and molecular data.

    View details for DOI 10.1038/s41525-020-0120-9

    View details for PubMedID 33574267

  • Predicting the tumor response to chemoradiotherapy for rectal cancer: Model development and external validation using MRI radiomics. Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology Bulens, P., Couwenberg, A., Intven, M., Debucquoy, A., Vandecaveye, V., Van Cutsem, E., D'Hoore, A., Wolthuis, A., Mukherjee, P., Gevaert, O., Haustermans, K. 2019

    Abstract

    BACKGROUND: In well-responding patients to chemoradiotherapy for locally advanced rectal cancer (LARC), a watch-and-wait strategy can be considered. To implement organ-sparing strategies, accurate patient selection is needed. We investigate the use of MRI-based radiomics models to predict tumor response to improve patient selection.MATERIALS AND METHODS: Models were developed in a cohort of 70 patients and validated in an external cohort of 55 patients. Patients received chemoradiation followed by surgery and underwent T2-weighted and diffusion-weighted MRI (DW-MRI) before and after chemoradiation. The outcome measure was (near-)complete pathological tumor response (ypT0-1N0). Tumor segmentation was done on T2-images and transferred to b800-images and ADC maps, after which quantitative and four semantic features were extracted. We combined features using principal component analysis and built models using LASSO regression analysis. The best models based on precision and performance were selected for validation.RESULTS: 21/70 patients (30%) achieved ypT0-1N0 in the development cohort versus 13/55 patients (24%) in the validation cohort. Three models (t2_dwi_pre_post, semantic_dwi_adc_pre, semantic_dwi_post) were identified with an area-under-the-curve (AUC) of 0.83 (95% CI 0.70-0.95), 0.86 (95% CI 0.75-0.98) and 0.84 (95% CI 0.75-0.94) respectively. Two models (t2_dwi_pre_post, semantic_dwi_post) validated well in the external cohort with AUCs of 0.83 (95% CI 0.70-0.95) and 0.86 (95% CI 0.76-0.97). These models however did not outperform a previously established four-feature semantic model.CONCLUSION: Prediction models based on MRI radiomics non-invasively predict tumor response after chemoradiation for rectal cancer and can be used as an additional tool to identify patients eligible for an organ-preserving treatment.

    View details for DOI 10.1016/j.radonc.2019.07.033

    View details for PubMedID 31431368

  • Marking the Path Toward Artificial Intelligence-Based Image Classification in Dermatology. JAMA dermatology Novoa, R. A., Gevaert, O., Ko, J. M. 2019

    View details for DOI 10.1001/jamadermatol.2019.1633

    View details for PubMedID 31411643

  • Combined Analysis of Metabolomes, Proteomes, and Transcriptomes of Hepatitis C Virus-Infected Cells and Liver to Identify Pathways Associated With Disease Development GASTROENTEROLOGY Lupberger, J., Croonenborghs, T., Suarez, A., Van Renne, N., Juhling, F., Oudot, M. A., Virzi, A., Bandiera, S., Jamey, C., Meszaros, G., Brumaru, D., Mukherji, A., Durand, S. C., Heydmann, L., Verrier, E. R., El Saghire, H., Hamdane, N., Bartenschlager, R., Fereshetian, S., Ramberger, E., Sinha, R., Nabian, M., Everaert, C., Jovanovic, M., Mertins, P., Carr, S. A., Chayama, K., Dali-Youcef, N., Ricci, R., Bardeesy, N. M., Fujiwara, N., Gevaert, O., Zeisel, M. B., Hoshida, Y., Pochet, N., Baumert, T. F. 2019; 157 (2): 537-+
  • Comparison of single and module-based methods for modeling gene regulatory networks. Bioinformatics (Oxford, England) Hernaez, M., Blatti, C., Gevaert, O. 2019

    Abstract

    MOTIVATION: Gene regulatory networks describe the regulatory relationships among genes, and developing methods for reverse engineering these networks is an ongoing challenge in computational biology. The majority of the initially proposed methods for gene regulatory network discovery create a network of genes and then mine it in order to uncover previously unknown regulatory processes. More recent approaches have focused on inferring modules of co-regulated genes, linking these modules with regulatory genes and then mining them to discover new molecular biology.RESULTS: In this work we analyze module-based network approaches to build gene regulatory networks, and compare their performance to single gene network approaches. In the process, we propose a novel approach to estimate gene regulatory networks drawing from the module-based methods. We show that generating modules of co-expressed genes which are predicted by a sparse set of regulators using a variational bayes method, and then building a bipartite graph on the generated modules using sparse regression, yields more informative networks than previous single and module-based network approaches as measured by: i) the rate of enriched gene sets, ii) a network topology assessment, iii) ChIP-Seq evidence, and iv) the KnowEnG Knowledge Network collection of previously characterized gene-gene interactions.AVAILABILITY: The code is written in R and can be downloaded from https://github.com/mikelhernaez/linker.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btz549

    View details for PubMedID 31287491

  • Combined Analysis of Metabolomes, Proteomes, and Transcriptomes of HCV-infected Cells and Liver to Identify Pathways Associated With Disease Development. Gastroenterology Lupberger, J., Croonenborghs, T., Roca Suarez, A. A., Van Renne, N., Juhling, F., Oudot, M. A., Virzi, A., Bandiera, S., Jamey, C., Meszaros, G., Brumaru, D., Mukherji, A., Durand, S. C., Heydmann, L., Verrier, E. R., El Saghire, H., Hamdane, N., Bartenschlager, R., Fereshetian, S., Ramberger, E., Sinha, R., Nabian, M., Everaert, C., Jovanovic, M., Mertins, P., Carr, S. A., Chayama, K., Dali-Youcef, N., Ricci, R., Bardeesy, N. M., Fujiwara, N., Gevaert, O., Zeisel, M. B., Hoshida, Y., Pochet, N., Baumert, T. F. 2019

    Abstract

    BACKGROUND & AIMS: The mechanisms of hepatitis C virus (HCV) infection, liver disease progression, and hepatocarcinogenesis are only partially understood. We performed genomic, proteomic, and metabolomic analyses of HCV-infected cells and chimeric mice to learn more about these processes.METHODS: Huh7.5.1dif (hepatocyte-like cells) were infected with culture-derived HCV and used in RNA-Seq, proteomic, metabolomic, and integrative genomic analyses. uPA/SCID mice were injected with serum from HCV-infected patients; 8 weeks later, liver tissues were collected and analyzed by RNA-seq and proteomics. Using differential expression, gene set enrichment analyses, and protein interaction mapping, we identified pathways that changed in response to HCV infection. We validated our findings in studies of liver tissues from 216 patients with HCV infection and early-stage cirrhosis and paired biopsies from 99 patients with hepatocellular carcinoma, including 17 patients with histologic features of steatohepatitis. Cirrhotic liver tissues from patients with HCV infection were classified into 2 groups based on relative peroxisome function; outcomes assessed included Child-Pugh class, development of hepatocellular carcinoma, survival and steatohepatitis. Hepatocellular carcinomas were classified according to steatohepatitis; the outcome was relative peroxisomal function.RESULTS: We quantified 21,950 mRNAs and 8297 proteins in HCV-infected cells. Upon HCV infection of hepatocyte-like cells and chimeric mice, we observed significant changes in levels of mRNAs and proteins involved in metabolism and hepatocarcinogenesis. HCV infection of hepatocyte-like cells significantly increased levels of mRNAs, but not proteins, that regulate the innate immune response-we believe this was due to the inhibition of translation in these cells. HCV infection of hepatocyte-like cells increased glucose consumption and metabolism and the STAT3 signaling pathway and reduced peroxisome function. Peroxisomes mediate beta-oxidation of very long-chain fatty acids (VLCFAs); we found intracellular accumulation of VLCFAs in HCV-infected cells, which is also observed in patients with fatty liver disease. Cells in livers from HCV-infected mice had significant reductions in levels of mRNAs and proteins associated with peroxisome function, indication perturbation of peroxisomes. We associated defects in peroxisome function with outcomes and features of HCV-associated cirrhosis, fatty liver disease, and hepatocellular carcinoma in patients.CONCLUSIONS: We performed combined transcriptome, proteome, and metabolome analyses of liver tissues from HCV-infected hepatocyte-like cells and HCV-infected mice. We found that HCV infection increases glucose metabolism and the STAT3 signaling pathway and thereby reduces peroxisome function; alterations in expression of peroxisome genes were associated with outcomes of patients with liver diseases. These findings provide insights into liver disease pathogenesis and might be used to identify new therapeutic targets.

    View details for PubMedID 30978357

  • Artificial intelligence and dermatology: opportunities, challenges, and future directions. Seminars in cutaneous medicine and surgery Schlessinger, D. I., Chhor, G., Gevaert, O., Swetter, S. M., Ko, J., Novoa, R. A. 2019; 38 (1): E31–37

    Abstract

    The application of artificial intelligence (AI) to medicine has considerable potential within dermatology, where the majority of diagnoses are based on visual pattern recognition. Opportunities for AI in dermatology include the potential to automate repetitive tasks; optimize time-consuming tasks; extend limited medical resources; improve interobserver reliability issues; and expand the diagnostic toolbox of dermatologists. To achieve the full potential of AI, however, developers must aim to create algorithms representing diverse patient populations; ensure algorithm output is ultimately interpretable; validate algorithm performance prospectively; preserve human-patient interaction when necessary; and demonstrate validity in the eyes of regulatory bodies.

    View details for PubMedID 31051021

  • Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning EUROPEAN RESPIRATORY JOURNAL Wang, S., Shi, J., Ye, Z., Dong, D., Yu, D., Zhou, M., Liu, Y., Gevaert, O., Wang, K., Zhu, Y., Zhou, H., Liu, Z., Tian, J. 2019; 53 (3)
  • Artificial intelligence and dermatology: opportunities, challenges, and future directions SEMINARS IN CUTANEOUS MEDICINE AND SURGERY Schlessinger, D. I., Chhor, G., Gevaert, O., Swetter, S. M., Ko, J., Novoa, R. A. 2019; 38 (1): E31–E37
  • Predicting EGFR Mutation Status in Lung Adenocarcinoma on CT Image Using Deep Learning. The European respiratory journal Wang, S., Shi, J., Ye, Z., Dong, D., Yu, D., Zhou, M., Liu, Y., Gevaert, O., Wang, K., Zhu, Y., Zhou, H., Liu, Z., Tian, J. 2019

    Abstract

    Epidermal Growth Factor Receptor (EGFR) genotyping is critical for treatment guideline such as the use of tyrosine kinase inhibitors in lung adenocarcinoma (LA). Conventional identification of EGFR genotype requires biopsy and sequence testing that is invasive and may suffer from the difficulty in accessing tissue samples. Here, we proposed a deep learning (DL) model to predict the EGFR mutation status in LA by non-invasive computed tomography (CT).We retrospectively collected 844 LA patients with preoperative CT image, EGFR mutation and clinical information from two hospitals. An end-to-end DL model was proposed to predict the EGFR mutation status by CT scanning.By training in 14926 CT images, the DL model achieved encouraging predictive performance in both the primary cohort (n=603; AUC=0.85, 95% CI 0.83-0.88) and the independent validation cohort (n=241; AUC=0.81, 95% CI 0.79-0.83), which showed significant improvement than previous studies using hand-crafted CT features or clinical characteristics (p<0.001). The deep learning score demonstrated significant difference in EGFR-mutant and EGFR-wild type tumours (p<0.001).Since CT is routinely used in lung cancer diagnosis, the DL model provides a non-invasive and easy-to-use method for EGFR mutation status prediction.

    View details for PubMedID 30635290

  • Non-invasive genotype prediction of chromosome 1p/19q co-deletion by development and validation of an MRI-based radiomics signature in lower-grade gliomas JOURNAL OF NEURO-ONCOLOGY Han, Y., Xie, Z., Zang, Y., Zhang, S., Gu, D., Zhou, M., Gevaert, O., Wei, J., Li, C., Chen, H., Du, J., Liu, Z., Dong, D., Tian, J., Zhou, D. 2018; 140 (2): 297–306

    Abstract

    To perform radiomics analysis for non-invasively predicting chromosome 1p/19q co-deletion in World Health Organization grade II and III (lower-grade) gliomas.This retrospective study included 277 patients histopathologically diagnosed with lower-grade glioma. Clinical parameters were recorded for each patient. We performed a radiomics analysis by extracting 647 MRI-based features and applied the random forest algorithm to generate a radiomics signature for predicting 1p/19q co-deletion in the training cohort (n = 184). The clinical model consisted of pertinent clinical factors, and was built using a logistic regression algorithm. A combined model, incorporating both the radiomics signature and related clinical factors, was also constructed. The receiver operating characteristics curve was used to evaluate the predictive performance. We further validated the predictability of the three developed models using a time-independent validation cohort (n = 93).The radiomics signature was constructed as an independent predictor for differentiating 1p/19q co-deletion genotypes, which demonstrated superior performance on both the training and validation cohorts with areas under curve (AUCs) of 0.887 and 0.760, respectively. These results outperformed the clinical model (AUCs of 0.580 and 0.627 on training and validation cohorts). The AUCs of the combined model were 0.885 and 0.753 on training and validation cohorts, respectively, which indicated that clinical factors did not present additional improvement for the prediction.Our study highlighted that an MRI-based radiomics signature can effectively identify the 1p/19q co-deletion in histopathologically diagnosed lower-grade gliomas, thereby offering the potential to facilitate non-invasive molecular subtype prediction of gliomas.

    View details for PubMedID 30097822

  • A radiogenomic dataset of non-small cell lung cancer SCIENTIFIC DATA Bakr, S., Gevaert, O., Echegaray, S., Ayers, K., Zhou, M., Shafiq, M., Zheng, H., Benson, J., Zhang, W., Leung, A. C., Kadoch, M., Hoang, C. D., Shrager, J., Quon, A., Rubin, D. L., Plevritis, S. K., Napel, S. 2018; 5
  • A radiogenomic dataset of non-small cell lung cancer. Scientific data Bakr, S., Gevaert, O., Echegaray, S., Ayers, K., Zhou, M., Shafiq, M., Zheng, H., Benson, J. A., Zhang, W., Leung, A. N., Kadoch, M., D Hoang, C., Shrager, J., Quon, A., Rubin, D. L., Plevritis, S. K., Napel, S. 2018; 5: 180202

    Abstract

    Medical image biomarkers of cancer promise improvements in patient care through advances in precision medicine. Compared to genomic biomarkers, image biomarkers provide the advantages of being non-invasive, and characterizing a heterogeneous tumor in its entirety, as opposed to limited tissue available via biopsy. We developed a unique radiogenomic dataset from a Non-Small Cell Lung Cancer (NSCLC) cohort of 211 subjects. The dataset comprises Computed Tomography (CT), Positron Emission Tomography (PET)/CT images, semantic annotations of the tumors as observed on the medical images using a controlled vocabulary, and segmentation maps of tumors in the CT scans. Imaging data are also paired with results of gene mutation analyses, gene expression microarrays and RNA sequencing data from samples of surgically excised tumor tissue, and clinical data, including survival outcomes. This dataset was created to facilitate the discovery of the underlying relationship between tumor molecular and medical image features, as well as the development and evaluation of prognostic medical image biomarkers.

    View details for PubMedID 30325352

  • MethylMix 2.0: an R package for identifying DNA methylation genes BIOINFORMATICS Cedoz, P., Prunello, M., Brennan, K., Gevaert, O. 2018; 34 (17): 3044-3046
  • NSD1 inactivation defines an immune cold, DNA hypomethylated subtype in squamous cell carcinoma Brennan, K., Gevaert, O., Sunwoo, J. B., Shin, J. AMER ASSOC CANCER RESEARCH. 2018
  • Benchmark of lncRNA quantification in RNA-Seq of cancer samples Zheng, H., Hernaez, M., Brennan, K., Gevaert, O. AMER ASSOC CANCER RESEARCH. 2018
  • Comprehensive analysis of cancer stemness Malta, T. M., Sokolov, A., Gentles, A. J., Burzykowski, T., Poisson, L., Weinstein, J., Kaminska, B., Huelsken, J., Omberg, L., Gevaert, O., Colaprico, A., Czerwinska, P., Mazurek, S., Mishra, L., Heyn, H., Krasnitz, A., Godwin, A. K., Lazar, A. J., Stuart, J. M., Hoadley, K., Laird, P. W., Noushmehr, H., Wiznerowicz, M., Canc Genome Atlas Res Network AMER ASSOC CANCER RESEARCH. 2018
  • Deep learning to predict survival prognosis for patients with non-small cell lung cancer using images and clinical data Lee, E. H., Zhou, M., Gamboa, N., Brennan, K., Itakura, H., Nair, V., Napel, S., Wong, S., Gevaert, O. AMER ASSOC CANCER RESEARCH. 2018
  • The ENGAGE study: Integrating neuroimaging, virtual reality and smartphone sensing to understand self-regulation for managing depression and obesity in a precision medicine model. Behaviour research and therapy Williams, L. M., Pines, A. n., Goldstein-Piekarski, A. N., Rosas, L. G., Kullar, M. n., Sacchet, M. D., Gevaert, O. n., Bailenson, J. n., Lavori, P. W., Dagum, P. n., Wandell, B. n., Correa, C. n., Greenleaf, W. n., Suppes, T. n., Perry, L. M., Smyth, J. M., Lewis, M. A., Venditti, E. M., Snowden, M. n., Simmons, J. M., Ma, J. n. 2018; 101: 58–70

    Abstract

    Precision medicine models for personalizing achieving sustained behavior change are largely outside of current clinical practice. Yet, changing self-regulatory behaviors is fundamental to the self-management of complex lifestyle-related chronic conditions such as depression and obesity - two top contributors to the global burden of disease and disability. To optimize treatments and address these burdens, behavior change and self-regulation must be better understood in relation to their neurobiological underpinnings. Here, we present the conceptual framework and protocol for a novel study, "Engaging self-regulation targets to understand the mechanisms of behavior change and improve mood and weight outcomes (ENGAGE)". The ENGAGE study integrates neuroscience with behavioral science to better understand the self-regulation related mechanisms of behavior change for improving mood and weight outcomes among adults with comorbid depression and obesity. We collect assays of three self-regulation targets (emotion, cognition, and self-reflection) in multiple settings: neuroimaging and behavioral lab-based measures, virtual reality, and passive smartphone sampling. By connecting human neuroscience and behavioral science in this manner within the ENGAGE study, we develop a prototype for elucidating the underlying self-regulation mechanisms of behavior change outcomes and their application in optimizing intervention strategies for multiple chronic diseases.

    View details for PubMedID 29074231

  • Development and validation of an MRI-based model to predict response to chemoradiotherapy for rectal cancer. Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology Bulens, P. n., Couwenberg, A. n., Haustermans, K. n., Debucquoy, A. n., Vandecaveye, V. n., Philippens, M. n., Zhou, M. n., Gevaert, O. n., Intven, M. n. 2018; 126 (3): 437–42

    Abstract

    To safely implement organ preserving treatment strategies for patients with rectal cancer, well-considered selection of patients with favourable response is needed. In this study, we develop and validate an MRI-based response predicting model.A multivariate model using T2-volumetric and DWI parameters before and 6 weeks after chemoradiation (CRT) was developed using a cohort of 85 rectal cancer patients and validated in an external cohort of 55 patients that underwent preoperative CRT.Twenty-two patients (26%) achieved ypT0-1N0 response in the development cohort versus 13 patients (24%) in the validation cohort. Two T2-volumetric parameters (ΔVolume% and Sphere_post) and two DWI parameters (ADC_avg_post and ADCratio_avg) were retained in a model predicting (near-)complete response (ypT0-1N0). In the development cohort, this model had a good predictive performance (AUC = 0.89; 95% CI 0.80-0.98). Validation of the model in an external cohort resulted in a similar performance (AUC = 0.88 95% CI 0.79-0.98).An MRI-based prediction model of (near-)complete pathological response following CRT in rectal cancer patients, shows a high predictive performance in an external validation cohort. The clinically relevant features in the model make it an interesting tool for implementation of organ-preserving strategies in rectal cancer.

    View details for PubMedID 29395287

  • Non-Small Cell Lung Cancer Radiogenomics Map Identifies Relationships between Molecular and Imaging Phenotypes with Prognostic Implications. Radiology Zhou, M. n., Leung, A. n., Echegaray, S. n., Gentles, A. n., Shrager, J. B., Jensen, K. C., Berry, G. J., Plevritis, S. K., Rubin, D. L., Napel, S. n., Gevaert, O. n. 2018; 286 (1): 307–15

    Abstract

    Purpose To create a radiogenomic map linking computed tomographic (CT) image features and gene expression profiles generated by RNA sequencing for patients with non-small cell lung cancer (NSCLC). Materials and Methods A cohort of 113 patients with NSCLC diagnosed between April 2008 and September 2014 who had preoperative CT data and tumor tissue available was studied. For each tumor, a thoracic radiologist recorded 87 semantic image features, selected to reflect radiologic characteristics of nodule shape, margin, texture, tumor environment, and overall lung characteristics. Next, total RNA was extracted from the tissue and analyzed with RNA sequencing technology. Ten highly coexpressed gene clusters, termed metagenes, were identified, validated in publicly available gene-expression cohorts, and correlated with prognosis. Next, a radiogenomics map was built that linked semantic image features to metagenes by using the t statistic and the Spearman correlation metric with multiple testing correction. Results RNA sequencing analysis resulted in 10 metagenes that capture a variety of molecular pathways, including the epidermal growth factor (EGF) pathway. A radiogenomic map was created with 32 statistically significant correlations between semantic image features and metagenes. For example, nodule attenuation and margins are associated with the late cell-cycle genes, and a metagene that represents the EGF pathway was significantly correlated with the presence of ground-glass opacity and irregular nodules or nodules with poorly defined margins. Conclusion Radiogenomic analysis of NSCLC showed multiple associations between semantic image features and metagenes that represented canonical molecular pathways, and it can result in noninvasive identification of molecular properties of NSCLC. Online supplemental material is available for this article.

    View details for PubMedID 28727543

  • Prediction of EGFR and KRAS mutation in non-small cell lung cancer using quantitative 18F FDG-PET/CT metrics. Oncotarget Minamimoto, R., Jamali, M., Gevaert, O., Echegaray, S., Khuong, A., Hoang, C. D., Shrager, J. B., Plevritis, S. K., Rubin, D. L., Leung, A. N., Napel, S., Quon, A. 2017; 8 (32): 52792-52801

    Abstract

    This study investigated the relationship between epidermal growth factor receptor (EGFR) and Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations in non-small-cell lung cancer (NSCLC) and quantitative FDG-PET/CT parameters including tumor heterogeneity. 131 patients with NSCLC underwent staging FDG-PET/CT followed by tumor resection and histopathological analysis that included testing for the EGFR and KRAS gene mutations. Patient and lesion characteristics, including smoking habits and FDG uptake parameters, were correlated to each gene mutation. Never-smoker (P < 0.001) or low pack-year smoking history (p = 0.002) and female gender (p = 0.047) were predictive factors for the presence of the EGFR mutations. Being a current or former smoker was a predictive factor for the KRAS mutations (p = 0.018). The maximum standardized uptake value (SUVmax) of FDG uptake in lung lesions was a predictive factor of the EGFR mutations (p = 0.029), while metabolic tumor volume and total lesion glycolysis were not predictive. Amongst several tumor heterogeneity metrics included in our analysis, inverse coefficient of variation (1/COV) was a predictive factor (p < 0.02) of EGFR mutations status, independent of metabolic tumor diameter. Multivariate analysis showed that being a never-smoker was the most significant factor (p < 0.001) for the EGFR mutations in lung cancer overall. The tumor heterogeneity metric 1/COV and SUVmax were both predictive for the EGFR mutations in NSCLC in a univariate analysis. Overall, smoking status was the most significant factor for the presence of the EGFR and KRAS mutations in lung cancer.

    View details for DOI 10.18632/oncotarget.17782

    View details for PubMedID 28881771

    View details for PubMedCentralID PMC5581070

  • Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO). Journal of biomedical informatics Panahiazar, M., Dumontier, M., Gevaert, O. 2017; 72: 132-139

    Abstract

    A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse.

    View details for DOI 10.1016/j.jbi.2017.06.017

    View details for PubMedID 28625880

    View details for PubMedCentralID PMC5643580

  • Central focused convolutional neural networks: Developing a data-driven model for lung nodule segmentation. Medical image analysis Wang, S., Zhou, M., Liu, Z., Liu, Z., Gu, D., Zang, Y., Dong, D., Gevaert, O., Tian, J. 2017; 40: 172-183

    Abstract

    Accurate lung nodule segmentation from computed tomography (CT) images is of great importance for image-driven lung cancer analysis. However, the heterogeneity of lung nodules and the presence of similar visual characteristics between nodules and their surroundings make it difficult for robust nodule segmentation. In this study, we propose a data-driven model, termed the Central Focused Convolutional Neural Networks (CF-CNN), to segment lung nodules from heterogeneous CT images. Our approach combines two key insights: 1) the proposed model captures a diverse set of nodule-sensitive features from both 3-D and 2-D CT images simultaneously; 2) when classifying an image voxel, the effects of its neighbor voxels can vary according to their spatial locations. We describe this phenomenon by proposing a novel central pooling layer retaining much information on voxel patch center, followed by a multi-scale patch learning strategy. Moreover, we design a weighted sampling to facilitate the model training, where training samples are selected according to their degree of segmentation difficulty. The proposed method has been extensively evaluated on the public LIDC dataset including 893 nodules and an independent dataset with 74 nodules from Guangdong General Hospital (GDGH). We showed that CF-CNN achieved superior segmentation performance with average dice scores of 82.15% and 80.02% for the two datasets respectively. Moreover, we compared our results with the inter-radiologists consistency on LIDC dataset, showing a difference in average dice score of only 1.98%.

    View details for DOI 10.1016/j.media.2017.06.014

    View details for PubMedID 28688283

    View details for PubMedCentralID PMC5661888

  • Quantitative imaging outperforms molecular markers when predicting response to chemoradiotherapy for rectal cancer. Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology Joye, I., Debucquoy, A., Deroose, C. M., Vandecaveye, V., Cutsem, E. V., Wolthuis, A., D'Hoore, A., Sagaert, X., Zhou, M., Gevaert, O., Haustermans, K. 2017; 124 (1): 104-109

    Abstract

    To explore the integration of imaging and molecular data for response prediction to chemoradiotherapy (CRT) for rectal cancer.Eighty-five rectal cancer patients underwent preoperative CRT.18F-FDG PET/CT and diffusion-weighted imaging (DWI) were acquired before (TP1) and during CRT (TP2) and prior to surgery (TP3). Inflammatory cytokines and gene expression were analysed. Tumour response was defined as ypT0-1N0. Multivariate models were built combining the obtained parameters. Final models were calculated on the data combination with the highest AUC.Twenty-two patients (26%) achieved ypT0-1N0 response.18F-FDG PET/CT had worse predictive performance than DWI and T2-volumetry (AUC 0.61±0.04, 0.72±0.03, and 0.72±0.02, respectively). Combining all imaging parameters increased the AUC to 0.81±0.03. Adding cytokines or gene expression did not improve the AUC (AUC of 0.72±0.06 and 0.79±0.04 respectively). Final models combining18F-FDG PET/CT, DWI, and T2-weighted volumetry at all TPs and using only TP1 and TP3, allowed ypT0-1N0 prediction with a 75% sensitivity, 94% specificity and PPV of 80%.Combining18F-FDG PET/CT, DWI, and T2-weighted MRI volumetry obtained before CRT and prior to surgery may help physicians in selecting rectal cancer patients for organ-preservation.

    View details for DOI 10.1016/j.radonc.2017.06.013

    View details for PubMedID 28647399

    View details for PubMedCentralID PMC5641595

  • Prediction of EGFR and KRAS mutation in non-small cell lung cancer using quantitative 18F FDG-PET/CT metrics. Oncotarget Minamimoto, R., Jamali, M., Gevaert, O., Echegaray, S., Khuong, A., Hoang, C. D., Shrager, J. B., Plevritis, S. K., Rubin, D. L., Leung, A. N., Napel, S., Quon, A. 2017

    Abstract

    This study investigated the relationship between epidermal growth factor receptor (EGFR) and Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations in non-small-cell lung cancer (NSCLC) and quantitative FDG-PET/CT parameters including tumor heterogeneity. 131 patients with NSCLC underwent staging FDG-PET/CT followed by tumor resection and histopathological analysis that included testing for the EGFR and KRAS gene mutations. Patient and lesion characteristics, including smoking habits and FDG uptake parameters, were correlated to each gene mutation. Never-smoker (P < 0.001) or low pack-year smoking history (p = 0.002) and female gender (p = 0.047) were predictive factors for the presence of the EGFR mutations. Being a current or former smoker was a predictive factor for the KRAS mutations (p = 0.018). The maximum standardized uptake value (SUVmax) of FDG uptake in lung lesions was a predictive factor of the EGFR mutations (p = 0.029), while metabolic tumor volume and total lesion glycolysis were not predictive. Amongst several tumor heterogeneity metrics included in our analysis, inverse coefficient of variation (1/COV) was a predictive factor (p < 0.02) of EGFR mutations status, independent of metabolic tumor diameter. Multivariate analysis showed that being a never-smoker was the most significant factor (p < 0.001) for the EGFR mutations in lung cancer overall. The tumor heterogeneity metric 1/COV and SUVmax were both predictive for the EGFR mutations in NSCLC in a univariate analysis. Overall, smoking status was the most significant factor for the presence of the EGFR and KRAS mutations in lung cancer.

    View details for DOI 10.18632/oncotarget.17782

    View details for PubMedID 28538213

  • Predictive radiogenomics modeling of EGFR mutation status in lung cancer SCIENTIFIC REPORTS Gevaert, O., Echegaray, S., Khuong, A., Hoang, C. D., Shrager, J. B., Jensen, K. C., Berry, G. J., Guo, H. H., Lau, C., Plevritis, S. K., Rubin, D. L., Napel, S., Leung, A. N. 2017; 7

    Abstract

    Molecular analysis of the mutation status for EGFR and KRAS are now routine in the management of non-small cell lung cancer. Radiogenomics, the linking of medical images with the genomic properties of human tumors, provides exciting opportunities for non-invasive diagnostics and prognostics. We investigated whether EGFR and KRAS mutation status can be predicted using imaging data. To accomplish this, we studied 186 cases of NSCLC with preoperative thin-slice CT scans. A thoracic radiologist annotated 89 semantic image features of each patient's tumor. Next, we built a decision tree to predict the presence of EGFR and KRAS mutations. We found a statistically significant model for predicting EGFR but not for KRAS mutations. The test set area under the ROC curve for predicting EGFR mutation status was 0.89. The final decision tree used four variables: emphysema, airway abnormality, the percentage of ground glass component and the type of tumor margin. The presence of either of the first two features predicts a wild type status for EGFR while the presence of any ground glass component indicates EGFR mutations. These results show the potential of quantitative imaging to predict molecular properties in a non-invasive manner, as CT imaging is more readily available than biopsies.

    View details for DOI 10.1038/srep41674

    View details for PubMedID 28139704

  • MicroRNA based Pan-Cancer Diagnosis and Treatment Recommendation BMC BIOINFORMATICS Cheerla, N., Gevaert, O. 2017; 18

    Abstract

    The current state-of-the-art in cancer diagnosis and treatment is not ideal; diagnostic tests are accurate but invasive, and treatments are "one-size fits-all" instead of being personalized. Recently, miRNA's have garnered significant attention as cancer biomarkers, owing to their ease of access (circulating miRNA in the blood) and stability. There have been many studies showing the effectiveness of miRNA data in diagnosing specific cancer types, but few studies explore the role of miRNA in predicting treatment outcome.Here we go a step further, using tissue miRNA and clinical data across 21 cancers from the 'The Cancer Genome Atlas' (TCGA) database. We use machine learning techniques to create an accurate pan-cancer diagnosis system, and a prediction model for treatment outcomes. Finally, using these models, we create a web-based tool that diagnoses cancer and recommends the best treatment options.We achieved 97.2% accuracy for classification using a support vector machine classifier with radial basis. The accuracies improved to 99.9-100% when climbing up the embryonic tree and classifying cancers at different stages. We define the accuracy as the ratio of the total number of instances correctly classified to the total instances. The classifier also performed well, achieving greater than 80% sensitivity for many cancer types on independent validation datasets. Many miRNAs selected by our feature selection algorithm had strong previous associations to various cancers and tumor progression. Then, using miRNA, clinical and treatment data and encoding it in a machine-learning readable format, we built a prognosis predictor model to predict the outcome of treatment with 85% accuracy. We used this model to create a tool that recommends personalized treatment regimens. Both the diagnosis and prognosis model, incorporating semi-supervised learning techniques to improve their accuracies with repeated use, were uploaded online for easy access.Our research is a step towards the final goal of diagnosing cancer and predicting treatment recommendations using non-invasive blood tests.

    View details for DOI 10.1186/s12859-016-1421-y

    View details for Web of Science ID 000392171000002

    View details for PubMedID 28086747

    View details for PubMedCentralID PMC5237282

  • Radiomics in Brain Tumor: Image Assessment, Quantitative Feature Descriptors, and Machine-Learning Approaches. AJNR. American journal of neuroradiology Zhou, M. n., Scott, J. n., Chaudhury, B. n., Hall, L. n., Goldgof, D. n., Yeom, K. W., Iv, M. n., Ou, Y. n., Kalpathy-Cramer, J. n., Napel, S. n., Gillies, R. n., Gevaert, O. n., Gatenby, R. n. 2017

    Abstract

    Radiomics describes a broad set of computational methods that extract quantitative features from radiographic images. The resulting features can be used to inform imaging diagnosis, prognosis, and therapy response in oncology. However, major challenges remain for methodologic developments to optimize feature extraction and provide rapid information flow in clinical settings. Equally important, to be clinically useful, predictive radiomic properties must be clearly linked to meaningful biologic characteristics and qualitative imaging properties familiar to radiologists. Here we use a cross-disciplinary approach to highlight studies in radiomics. We review brain tumor radiologic studies (eg, imaging interpretation) through computational models (eg, computer vision and machine learning) that provide novel clinical insights. We outline current quantitative image feature extraction and prediction strategies with different levels of available clinical classes for supporting clinical decision-making. We further discuss machine-learning challenges and data opportunities to advance radiomic studies.

    View details for DOI 10.3174/ajnr.A5391

    View details for PubMedID 28982791

  • Intestinal Enteroendocrine Lineage Cells Possess Homeostatic and Injury-Inducible Stem Cell Activity Cell Stem Cell Yan, K., Gevaert, O., Zheng, G., Anchang, B., Probert, C., et al 2017; 21 (1): 78 - 90.e6

    Abstract

    Several cell populations have been reported to possess intestinal stem cell (ISC) activity during homeostasis and injury-induced regeneration. Here, we explored inter-relationships between putative mouse ISC populations by comparative RNA-sequencing (RNA-seq). The transcriptomes of multiple cycling ISC populations closely resembled Lgr5+ISCs, the most well-defined ISC pool, but Bmi1-GFP+cells were distinct and enriched for enteroendocrine (EE) markers, including Prox1. Prox1-GFP+cells exhibited sustained clonogenic growth in vitro, and lineage-tracing of Prox1+cells revealed long-lived clones during homeostasis and after radiation-induced injury in vivo. Single-cell mRNA-seq revealed two subsets of Prox1-GFP+cells, one of which resembled mature EE cells while the other displayed low-level EE gene expression but co-expressed tuft cell markers, Lgr5 and Ascl2, reminiscent of label-retaining secretory progenitors. Our data suggest that the EE lineage, including mature EE cells, comprises a reservoir of homeostatic and injury-inducible ISCs, extending our understanding of cellular plasticity and stemness.

    View details for DOI 10.1016/j.stem.2017.06.014

    View details for PubMedCentralID PMC5642297

  • Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations. AMIA ... Annual Symposium proceedings. AMIA Symposium Martinez-Romero, M., O'Connor, M. J., Shankar, R. D., Panahiazar, M., Willrett, D., Egyedi, A. L., Gevaert, O., Graybeal, J., Musen, M. A. 2017; 2017: 1272–81

    Abstract

    In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.

    View details for PubMedID 29854196

  • NSD1 inactivation defines an immune cold, DNA hypomethylated subtype in squamous cell carcinoma. Scientific reports Brennan, K. n., Shin, J. H., Tay, J. K., Prunello, M. n., Gentles, A. J., Sunwoo, J. B., Gevaert, O. n. 2017; 7 (1): 17064

    Abstract

    Chromatin modifying enzymes are frequently mutated in cancer, resulting in widespread epigenetic deregulation. Recent reports indicate that inactivating mutations in the histone methyltransferase NSD1 define an intrinsic subtype of head and neck squamous cell carcinoma (HNSC) that features pronounced DNA hypomethylation. Here, we describe a similar hypomethylated subtype of lung squamous cell carcinoma (LUSC) that is enriched for both inactivating mutations and deletions in NSD1. The 'NSD1 subtypes' of HNSC and LUSC are highly correlated at the DNA methylation and gene expression levels, featuring ectopic expression of developmental transcription factors and genes that are also hypomethylated in Sotos syndrome, a congenital disorder caused by germline NSD1 mutations. Further, the NSD1 subtype of HNSC displays an 'immune cold' phenotype characterized by low infiltration of tumor-associated leukocytes, particularly macrophages and CD8+ T cells, as well as low expression of genes encoding the immunotherapy target PD-1 immune checkpoint receptor and its ligands. Using an in vivo model, we demonstrate that NSD1 inactivation results in reduced T cell infiltration into the tumor microenvironment, implicating NSD1 as a tumor cell-intrinsic driver of an immune cold phenotype. NSD1 inactivation therefore causes epigenetic deregulation across cancer sites, and has implications for immunotherapy.

    View details for PubMedID 29213088

  • A multi-view deep convolutional neural networks for lung nodule segmentation. Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference Shuo Wang, n. n., Mu Zhou, n. n., Gevaert, O. n., Zhenchao Tang, n. n., Di Dong, n. n., Zhenyu Liu, n. n., Jie Tian, n. n. 2017; 2017: 1752–55

    Abstract

    We present a multi-view convolutional neural networks (MV-CNN) for lung nodule segmentation. The MV-CNN specialized in capturing a diverse set of nodule-sensitive features from axial, coronal and sagittal views in CT images simultaneously. The proposed network architecture consists of three CNN branches, where each branch includes seven stacked layers and takes multi-scale nodule patches as input. The three CNN branches are then integrated with a fully connected layer to predict whether the patch center voxel belongs to the nodule. The proposed method has been evaluated on 893 nodules from the public LIDC-IDRI dataset, where ground-truth annotations and CT imaging data were provided. We showed that MV-CNN demonstrated encouraging performance for segmenting various type of nodules including juxta-pleural, cavitary, and non-solid nodules, achieving an average dice similarity coefficient (DSC) of 77.67% and average surface distance (ASD) of 0.24, outperforming conventional image segmentation approaches.

    View details for DOI 10.1109/EMBC.2017.8037182

    View details for PubMedID 29060226

  • Magnetic resonance perfusion image features uncover an angiogenic subgroup of glioblastoma patients with poor survival and better response to antiangiogenic treatment. Neuro-oncology Liu, T. T., Achrol, A. S., Mitchell, L. A., Rodriguez, S. A., Feroze, A., Kim, C., Chaudhary, N., Gevaert, O., Stuart, J. M., Harsh, G. R., Chang, S. D., Rubin, D. L. 2016

    Abstract

    In previous clinical trials, antiangiogenic therapies such as bevacizumab did not show efficacy in patients with newly diagnosed glioblastoma (GBM). This may be a result of the heterogeneity of GBM, which has a variety of imaging-based phenotypes and gene expression patterns. In this study, we sought to identify a phenotypic subtype of GBM patients who have distinct tumor-image features and molecular activities and who may benefit from antiangiogenic therapies.Quantitative image features characterizing subregions of tumors and the whole tumor were extracted from preoperative and pretherapy perfusion magnetic resonance (MR) images of 117 GBM patients in 2 independent cohorts. Unsupervised consensus clustering was performed to identify robust clusters of GBM in each cohort. Cox survival and gene set enrichment analyses were conducted to characterize the clinical significance and molecular pathway activities of the clusters. The differential treatment efficacy of antiangiogenic therapy between the clusters was evaluated.A subgroup of patients with elevated perfusion features was identified and was significantly associated with poor patient survival after accounting for other clinical covariates (P values <.01; hazard ratios > 3) consistently found in both cohorts. Angiogenesis and hypoxia pathways were enriched in this subgroup of patients, suggesting the potential efficacy of antiangiogenic therapy. Patients of the angiogenic subgroups pooled from both cohorts, who had chemotherapy information available, had significantly longer survival when treated with antiangiogenic therapy (log-rank P=.022).Our findings suggest that an angiogenic subtype of GBM patients may benefit from antiangiogenic therapy with improved overall survival.

    View details for DOI 10.1093/neuonc/now270

    View details for PubMedID 28007759

  • A Rapid Segmentation-Insensitive "Digital Biopsy" Method for Radiomic Feature Extraction: Method and Pilot Study Using CT Images of Non-Small Cell Lung Cancer. Tomography : a journal for imaging research Echegaray, S., Nair, V., Kadoch, M., Leung, A., Rubin, D., Gevaert, O., Napel, S. 2016; 2 (4): 283-294

    Abstract

    Quantitative imaging approaches compute features within images' regions of interest. Segmentation is rarely completely automatic, requiring time-consuming editing by experts. We propose a new paradigm, called "digital biopsy," that allows for the collection of intensity- and texture-based features from these regions at least 1 order of magnitude faster than the current manual or semiautomated methods. A radiologist reviewed automated segmentations of lung nodules from 100 preoperative volume computed tomography scans of patients with non-small cell lung cancer, and manually adjusted the nodule boundaries in each section, to be used as a reference standard, requiring up to 45 minutes per nodule. We also asked a different expert to generate a digital biopsy for each patient using a paintbrush tool to paint a contiguous region of each tumor over multiple cross-sections, a procedure that required an average of <3 minutes per nodule. We simulated additional digital biopsies using morphological procedures. Finally, we compared the features extracted from these digital biopsies with our reference standard using intraclass correlation coefficient (ICC) to characterize robustness. Comparing the reference standard segmentations to our digital biopsies, we found that 84/94 features had an ICC >0.7; comparing erosions and dilations, using a sphere of 1.5-mm radius, of our digital biopsies to the reference standard segmentations resulted in 41/94 and 53/94 features, respectively, with ICCs >0.7. We conclude that many intensity- and texture-based features remain consistent between the reference standard and our method while substantially reducing the amount of operator time required.

    View details for DOI 10.18383/j.tom.2016.00163

    View details for PubMedID 28612050

    View details for PubMedCentralID PMC5466872

  • Chromatin-Remodeling Complex SWI/SNF Controls Multidrug Resistance by Transcriptionally Regulating the Drug Efflux Pump ABCB1 CANCER RESEARCH Dubey, R., Lebensohn, A. M., Bahrami-Nejad, Z., Marceau, C., Champion, M., Gevaert, O., Sikic, B. I., Carette, J. E., Rohatgi, R. 2016; 76 (19): 5810-5821

    Abstract

    Anthracyclines are among the most effective yet most toxic drugs used in the oncology clinic. The nucleosome-remodeling SWI/SNF complex, a potent tumor suppressor, is thought to promote sensitivity to anthracyclines by recruiting topoisomerase IIa (TOP2A) to DNA and increasing double-strand breaks. In this study, we discovered a novel mechanism through which SWI/SNF influences resistance to the widely used anthracycline doxorubicin based on the use of a forward genetic screen in haploid human cells, followed by a rigorous single and double-mutant epistasis analysis using CRISPR/Cas9-mediated engineering. Doxorubicin resistance conferred by loss of the SMARCB1 subunit of the SWI/SNF complex was caused by transcriptional upregulation of a single gene, encoding the multidrug resistance pump ABCB1. Remarkably, both ABCB1 upregulation and doxorubicin resistance caused by SMARCB1 loss were dependent on the function of SMARCA4, a catalytic subunit of the SWI/SNF complex. We propose that residual SWI/SNF complexes lacking SMARCB1 are vital determinants of drug sensitivity, not just to TOP2A-targeted agents, but to the much broader range of cancer drugs effluxed by ABCB1. Cancer Res; 76(19); 5810-21. ©2016 AACR.

    View details for DOI 10.1158/0008-5472.CAN-16-0716

    View details for Web of Science ID 000385625500025

    View details for PubMedID 27503929

    View details for PubMedCentralID PMC5050136

  • Transforming Big Data into Cancer-Relevant Insight: An Initial, Multi-Tier Approach to Assess Reproducibility and Relevance The Cancer Target Discovery and Development Network MOLECULAR CANCER RESEARCH Clemons, P. A., Shamji, A., Hon, C., Wagner, B. K., Schreiber, S. L., Krasnitz, A., Sordella, R., Sander, C., Lowe, S. W., Powers, S., Smith, K., Aburi, M., Lavarone, A., Lasorella, A., Silva, J., Stockwell, B., Califano, A., Boehm, J. S., Vazquez, F., Weir, B. A., Golub, T. R., Hahn, W. C., Khuri, F. R., Moreno, C. S., Du, Y., Cooper, L., Ivanov, A. A., Johns, M. A., Fu, H., Nikolova, O., Mendez, E., Gadi, V. K., Margolin, A. A., Grandori, C., Kemp, C. J., Warren, E. H., Riddell, S. R., McIntosh, M. W., Gevaert, O., Ji, H. P., Kuo, C. J., Dhruv, H., Finlay, D., Kiefer, J., Kim, S., Vuori, K., Berens, M. E., Weissman, J., Bivona, T., Bandyopadhyay, S., Hangauer, M., Boettcher, M., McManus, M., McCormick, F., Aksoy, O., Simonds, E. F., Zheng, T., Chen, J., An, Z., Balmain, A., Weiss, W. A., Chen, K., Liang, H., Scott, K. L., Mills, G. B., Posner, B. A., Macmillan, J., Minna, J., White, M. A., Roth, M. G., Jagu, S., Mazerik, J. N., Gerhard, D. S. 2016; 14 (8): 675-682
  • Predicting structured metadata from unstructured metadata DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Posch, L., Panahiazar, M., Dumontier, M., Gevaert, O. 2016
  • CoINcIDE: A framework for discovery of patient subtypes across multiple datasets GENOME MEDICINE Planey, C. R., Gevaert, O. 2016; 8

    Abstract

    Patient disease subtypes have the potential to transform personalized medicine. However, many patient subtypes derived from unsupervised clustering analyses on high-dimensional datasets are not replicable across multiple datasets, limiting their clinical utility. We present CoINcIDE, a novel methodological framework for the discovery of patient subtypes across multiple datasets that requires no between-dataset transformations. We also present a high-quality database collection, curatedBreastData, with over 2,500 breast cancer gene expression samples. We use CoINcIDE to discover novel breast and ovarian cancer subtypes with prognostic significance and novel hypothesized ovarian therapeutic targets across multiple datasets. CoINcIDE and curatedBreastData are available as R packages.

    View details for DOI 10.1186/s13073-016-0281-4

    View details for Web of Science ID 000371588100001

    View details for PubMedID 26961683

    View details for PubMedCentralID PMC4784276

  • Single Gene Prognostic Biomarkers in Ovarian Cancer: A Meta-Analysis PLOS ONE Willis, S., Villalobos, V. M., Gevaert, O., Abramovitz, M., Williams, C., Sikic, B. I., Leyland-Jones, B. 2016; 11 (2)

    Abstract

    To discover novel prognostic biomarkers in ovarian serous carcinomas.A meta-analysis of all single genes probes in the TCGA and HAS ovarian cohorts was performed to identify possible biomarkers using Cox regression as a continuous variable for overall survival. Genes were ranked by p-value using Stouffer's method and selected for statistical significance with a false discovery rate (FDR) <.05 using the Benjamini-Hochberg method.Twelve genes with high mRNA expression were prognostic of poor outcome with an FDR <.05 (AXL, APC, RAB11FIP5, C19orf2, CYBRD1, PINK1, LRRN3, AQP1, DES, XRCC4, BCHE, and ASAP3). Twenty genes with low mRNA expression were prognostic of poor outcome with an FDR <.05 (LRIG1, SLC33A1, NUCB2, POLD3, ESR2, GOLPH3, XBP1, PAXIP1, CYB561, POLA2, CDH1, GMNN, SLC37A4, FAM174B, AGR2, SDR39U1, MAGT1, GJB1, SDF2L1, and C9orf82).A meta-analysis of all single genes identified thirty-two candidate biomarkers for their possible role in ovarian serous carcinoma. These genes can provide insight into the drivers or regulators of ovarian cancer and should be evaluated in future studies. Genes with high expression indicating poor outcome are possible therapeutic targets with known antagonists or inhibitors. Additionally, the genes could be combined into a prognostic multi-gene signature and tested in future ovarian cohorts.

    View details for DOI 10.1371/journal.pone.0149183

    View details for Web of Science ID 000371218400064

    View details for PubMedID 26886260

    View details for PubMedCentralID PMC4757072

  • Development of prognostic signatures for intermediate-risk papillary thyroid cancer. BMC cancer Brennan, K., Holsinger, C., Dosiou, C., Sunwoo, J. B., Akatsu, H., Haile, R., Gevaert, O. 2016; 16 (1): 736-?

    Abstract

    The incidence of Papillary thyroid carcinoma (PTC), the most common type of thyroid malignancy, has risen rapidly worldwide. PTC usually has an excellent prognosis. However, the rising incidence of PTC, due at least partially to widespread use of neck imaging studies with increased detection of small cancers, has created a clinical issue of overdiagnosis, and consequential overtreatment. We investigated how molecular data can be used to develop a prognostics signature for PTC.The Cancer Genome Atlas (TCGA) recently reported on the genomic landscape of a large cohort of PTC cases. In order to decrease unnecessary morbidity associated with over diagnosing PTC patient with good prognosis, we used TCGA data to develop a gene expression signature to distinguish between patients with good and poor prognosis. We selected a set of clinical phenotypes to define an 'extreme poor' prognosis group and an 'extreme good' prognosis group and developed a gene signature that characterized these.We discovered a gene expression signature that distinguished the extreme good from extreme poor prognosis patients. Next, we applied this signature to the remaining intermediate risk patients, and show that they can be classified in clinically meaningful risk groups, characterized by established prognostic disease phenotypes. Analysis of the genes in the signature shows many known and novel genes involved in PTC prognosis.This work demonstrates that using a selection of clinical phenotypes and treatment variables, it is possible to develop a statistically useful and biologically meaningful gene signature of PTC prognosis, which may be developed as a biomarker to help prevent overdiagnosis.

    View details for DOI 10.1186/s12885-016-2771-6

    View details for PubMedID 27633254

  • Predicting structured metadata from unstructured metadata. Database : the journal of biological databases and curation Posch, L., Panahiazar, M., Dumontier, M., Gevaert, O. 2016; 2016

    Abstract

    Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data-defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction.

    View details for DOI 10.1093/database/baw080

    View details for PubMedID 28637268

    View details for PubMedCentralID PMC4892825

  • A Rapid Segmentation-Insensitive 'Digital Biopsy' Method for Radiomic Feature Extraction; Method and Pilot Study Using CT Images of Non-Small Cell Lung Cancer Tomography Echegaray, S., Nair, V., Kadoch, M., Leung, A., Rubin, D., Gevaert, O., Napel Sandy , et al 2016; 2 (4): 283–94

    Abstract

    Quantitative imaging approaches compute features within images' regions of interest. Segmentation is rarely completely automatic, requiring time-consuming editing by experts. We propose a new paradigm, called "digital biopsy," that allows for the collection of intensity- and texture-based features from these regions at least 1 order of magnitude faster than the current manual or semiautomated methods. A radiologist reviewed automated segmentations of lung nodules from 100 preoperative volume computed tomography scans of patients with non-small cell lung cancer, and manually adjusted the nodule boundaries in each section, to be used as a reference standard, requiring up to 45 minutes per nodule. We also asked a different expert to generate a digital biopsy for each patient using a paintbrush tool to paint a contiguous region of each tumor over multiple cross-sections, a procedure that required an average of <3 minutes per nodule. We simulated additional digital biopsies using morphological procedures. Finally, we compared the features extracted from these digital biopsies with our reference standard using intraclass correlation coefficient (ICC) to characterize robustness. Comparing the reference standard segmentations to our digital biopsies, we found that 84/94 features had an ICC >0.7; comparing erosions and dilations, using a sphere of 1.5-mm radius, of our digital biopsies to the reference standard segmentations resulted in 41/94 and 53/94 features, respectively, with ICCs >0.7. We conclude that many intensity- and texture-based features remain consistent between the reference standard and our method while substantially reducing the amount of operator time required.

    View details for DOI 10.18383/j.tom.2016.00163

    View details for PubMedCentralID PMC5466872

  • Magnetic resonance perfusion image features uncover an angiogenic subgroup of glioblastoma patients with poor survival and better response to antiangiogenic treatment. Neuro-Oncology Liu, T. T., Achrol, A. S., Mitchell, L. A., Rodriguez, S. A., Feroze, A., Iv, M., Kim, C., Chaudhary, N., Gevaert, O., Stuart, J. M., Harsh, G. R., Chang, S. D., Rubin, D. L. 2016

    Abstract

    In previous clinical trials, antiangiogenic therapies such as bevacizumab did not show efficacy in patients with newly diagnosed glioblastoma (GBM). This may be a result of the heterogeneity of GBM, which has a variety of imaging-based phenotypes and gene expression patterns. In this study, we sought to identify a phenotypic subtype of GBM patients who have distinct tumor-image features and molecular activities and who may benefit from antiangiogenic therapies.Quantitative image features characterizing subregions of tumors and the whole tumor were extracted from preoperative and pretherapy perfusion magnetic resonance (MR) images of 117 GBM patients in 2 independent cohorts. Unsupervised consensus clustering was performed to identify robust clusters of GBM in each cohort. Cox survival and gene set enrichment analyses were conducted to characterize the clinical significance and molecular pathway activities of the clusters. The differential treatment efficacy of antiangiogenic therapy between the clusters was evaluated.A subgroup of patients with elevated perfusion features was identified and was significantly associated with poor patient survival after accounting for other clinical covariates (P values <.01; hazard ratios > 3) consistently found in both cohorts. Angiogenesis and hypoxia pathways were enriched in this subgroup of patients, suggesting the potential efficacy of antiangiogenic therapy. Patients of the angiogenic subgroups pooled from both cohorts, who had chemotherapy information available, had significantly longer survival when treated with antiangiogenic therapy (log-rank P=.022).Our findings suggest that an angiogenic subtype of GBM patients may benefit from antiangiogenic therapy with improved overall survival.

    View details for DOI 10.1093/neuonc/now270

  • COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K SCIENTIFIC REPORTS Sweeney, T. E., Chen, A. C., Gevaert, O. 2015; 5

    Abstract

    In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of 'dark art', with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R.

    View details for DOI 10.1038/srep16971

    View details for Web of Science ID 000364936000001

    View details for PubMedID 26581809

  • The center for expanded data annotation and retrieval. Journal of the American Medical Informatics Association Musen, M. A., Bean, C. A., Cheung, K., Dumontier, M., Durante, K. A., Gevaert, O., Gonzalez-Beltran, A., Khatri, P., Kleinstein, S. H., O'Connor, M. J., Pouliot, Y., Rocca-Serra, P., Sansone, S., Wiser, J. A. 2015; 22 (6): 1148-1152

    Abstract

    The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments.

    View details for DOI 10.1093/jamia/ocv048

    View details for PubMedID 26112029

  • Core samples for radiomics features that are insensitive to tumor segmentation: method and pilot study using CT images of hepatocellular carcinoma. Journal of medical imaging (Bellingham, Wash.) Echegaray, S., Gevaert, O., Shah, R., Kamaya, A., Louie, J., Kothary, N., Napel, S. 2015; 2 (4): 041011-?

    Abstract

    The purpose of this study is to investigate the utility of obtaining "core samples" of regions in CT volume scans for extraction of radiomic features. We asked four readers to outline tumors in three representative slices from each phase of multiphasic liver CT images taken from 29 patients (1128 segmentations) with hepatocellular carcinoma. Core samples were obtained by automatically tracing the maximal circle inscribed in the outlines. Image features describing the intensity, texture, shape, and margin were used to describe the segmented lesion. We calculated the intraclass correlation between the features extracted from the readers' segmentations and their core samples to characterize robustness to segmentation between readers, and between human-based segmentation and core sampling. We conclude that despite the high interreader variability in manually delineating the tumor (average overlap of 43% across all readers), certain features such as intensity and texture features are robust to segmentation. More importantly, this same subset of features can be obtained from the core samples, providing as much information as detailed segmentation while being simpler and faster to obtain.

    View details for DOI 10.1117/1.JMI.2.4.041011

    View details for PubMedID 26587549

  • Addition of MR imaging features and genetic biomarkers strengthens glioblastoma survival prediction in TCGA patients. Journal of neuroradiology. Journal de neuroradiologie Nicolasjilwan, M., Hu, Y., Yan, C., Meerzaman, D., Holder, C. A., Gutman, D., Jain, R., Colen, R., Rubin, D. L., Zinn, P. O., Hwang, S. N., Raghavan, P., Hammoud, D. A., Scarpace, L. M., Mikkelsen, T., Chen, J., Gevaert, O., Buetow, K., Freymann, J., Kirby, J., Flanders, A. E., Wintermark, M. 2015; 42 (4): 212-221

    Abstract

    The purpose of our study was to assess whether a model combining clinical factors, MR imaging features, and genomics would better predict overall survival of patients with glioblastoma (GBM) than either individual data type.The study was conducted leveraging The Cancer Genome Atlas (TCGA) effort supported by the National Institutes of Health. Six neuroradiologists reviewed MRI images from The Cancer Imaging Archive (http://cancerimagingarchive.net) of 102 GBM patients using the VASARI scoring system. The patients' clinical and genetic data were obtained from the TCGA website (http://www.cancergenome.nih.gov/). Patient outcome was measured in terms of overall survival time. The association between different categories of biomarkers and survival was evaluated using Cox analysis.The features that were significantly associated with survival were: (1) clinical factors: chemotherapy; (2) imaging: proportion of tumor contrast enhancement on MRI; and (3) genomics: HRAS copy number variation. The combination of these three biomarkers resulted in an incremental increase in the strength of prediction of survival, with the model that included clinical, imaging, and genetic variables having the highest predictive accuracy (area under the curve 0.679±0.068, Akaike's information criterion 566.7, P<0.001).A combination of clinical factors, imaging features, and HRAS copy number variation best predicts survival of patients with GBM.

    View details for DOI 10.1016/j.neurad.2014.02.006

    View details for PubMedID 24997477

  • DNA Methylation-Guided Prediction of Clinical Failure in High-Risk Prostate Cancer PLOS ONE Litovkin, K., Van Eynde, A., Joniau, S., Lerut, E., Laenen, A., Gevaert, T., Gevaert, O., Spahn, M., Kneitz, B., Gramme, P., Helleputte, T., Isebaert, S., Haustermans, K., Bollen, M. 2015; 10 (6)

    Abstract

    Prostate cancer (PCa) is a very heterogeneous disease with respect to clinical outcome. This study explored differential DNA methylation in a priori selected genes to diagnose PCa and predict clinical failure (CF) in high-risk patients.A quantitative multiplex, methylation-specific PCR assay was developed to assess promoter methylation of the APC, CCND2, GSTP1, PTGS2 and RARB genes in formalin-fixed, paraffin-embedded tissue samples from 42 patients with benign prostatic hyperplasia and radical prostatectomy specimens of patients with high-risk PCa, encompassing training and validation cohorts of 147 and 71 patients, respectively. Log-rank tests, univariate and multivariate Cox models were used to investigate the prognostic value of the DNA methylation.Hypermethylation of APC, CCND2, GSTP1, PTGS2 and RARB was highly cancer-specific. However, only GSTP1 methylation was significantly associated with CF in both independent high-risk PCa cohorts. Importantly, trichotomization into low, moderate and high GSTP1 methylation level subgroups was highly predictive for CF. Patients with either a low or high GSTP1 methylation level, as compared to the moderate methylation groups, were at a higher risk for CF in both the training (Hazard ratio [HR], 3.65; 95% CI, 1.65 to 8.07) and validation sets (HR, 4.27; 95% CI, 1.03 to 17.72) as well as in the combined cohort (HR, 2.74; 95% CI, 1.42 to 5.27) in multivariate analysis.Classification of primary high-risk tumors into three subtypes based on DNA methylation can be combined with clinico-pathological parameters for a more informative risk-stratification of these PCa patients.

    View details for DOI 10.1371/journal.pone.0130651

    View details for Web of Science ID 000356567500126

    View details for PubMedID 26086362

    View details for PubMedCentralID PMC4472347

  • MethylMix: an R package for identifying DNA methylation-driven genes BIOINFORMATICS Gevaert, O. 2015; 31 (11): 1839-1841

    Abstract

    DNA methylation is an important mechanism regulating gene transcription, and its role in carcinogenesis has been extensively studied. Hyper and hypomethylation of genes is an alternative mechanism to deregulate gene expression in a wide range of diseases. At the same time, high-throughput DNA methylation assays have been developed generating vast amounts of genome wide DNA methylation measurements. Yet, few tools exist that can formally identify hypo and hypermethylated genes that are predictive of transcription and thus functionally relevant for a particular disease. To accommodate this lack of tools, we developed MethylMix, an algorithm implemented in R to identify disease specific hyper and hypomethylated genes. MethylMix is based on a beta mixture model to identify methylation states and compares them with the normal DNA methylation state. MethylMix introduces a novel metric, the 'Differential Methylation value' or DM-value defined as the difference of a methylation state with the normal methylation state. Finally, matched gene expression data are used to identify, besides differential, transcriptionally predictive methylation states by focusing on methylation changes that effect gene expression.MethylMix was implemented as an R package and is available in bioconductor.

    View details for DOI 10.1093/bioinformatics/btv020

    View details for Web of Science ID 000356625300021

    View details for PubMedID 25609794

    View details for PubMedCentralID PMC4443673

  • Combining bevacizumab and chemoradiation in rectal cancer. Translational results of the AXEBeam trial. British journal of cancer Verstraete, M., Debucquoy, A., Dekervel, J., van Pelt, J., Verslype, C., Devos, E., Chiritescu, G., Dumon, K., D'Hoore, A., Gevaert, O., Sagaert, X., Van Cutsem, E., Haustermans, K. 2015; 112 (8): 1314-1325

    Abstract

    This study characterises molecular effect of bevacizumab, and explores the relation of molecular and genetic markers with response to bevacizumab combined with chemoradiotherapy (CRT).From a subset of 59 patients of 84 rectal cancer patients included in a phase II study combining bevacizumab with CRT, tumour and blood samples were collected before and during treatment, offering the possibility to evaluate changes induced by one dose of bevacizumab. We performed cDNA microarrays, stains for CD31/CD34 combined with α-SMA and CA-IX, as well as enzyme-linked immunosorbent assay (ELISA) for circulating angiogenic proteins. Markers were related with the pathological response of patients.One dose of bevacizumab changed the expression of 14 genes and led to a significant decrease in microvessel density and in the proportion of pericyte-covered blood vessels, and a small but nonsignificant increase in hypoxia. Alterations in angiogenic processes after bevacizumab delivery were only detected in responding tumours. Lower PDGFA expression and PDGF-BB levels, less pericyte-covered blood vessels and higher CA-IX expression were found after bevacizumab treatment only in patients with pathological complete response.We could not support the 'normalization hypothesis' and suggest a role for PDGFA, PDGF-BB, CA-IX and α-SMA. Validation in larger patient groups is needed.

    View details for DOI 10.1038/bjc.2015.93

    View details for PubMedID 25867261

  • Methylation of PITX2, HOXD3, RASSF1 and TDRD1 predicts biochemical recurrence in high-risk prostate cancer JOURNAL OF CANCER RESEARCH AND CLINICAL ONCOLOGY Litovkin, K., Joniau, S., Lerut, E., Laenen, A., Gevaert, O., Spahn, M., Kneitz, B., Isebaert, S., Haustermans, K., Beullens, M., Van Eynde, A., Bollen, M. 2014; 140 (11): 1849-1861
  • Glioblastoma multiforme: exploratory radiogenomic analysis by using quantitative image features. Radiology Gevaert, O., Mitchell, L. A., Achrol, A. S., Xu, J., Echegaray, S., Steinberg, G. K., Cheshier, S. H., Napel, S., Zaharchuk, G., Plevritis, S. K. 2014; 273 (1): 168-174

    Abstract

    To derive quantitative image features from magnetic resonance (MR) images that characterize the radiographic phenotype of glioblastoma multiforme (GBM) lesions and to create radiogenomic maps associating these features with various molecular data.Clinical, molecular, and MR imaging data for GBMs in 55 patients were obtained from the Cancer Genome Atlas and the Cancer Imaging Archive after local ethics committee and institutional review board approval. Regions of interest (ROIs) corresponding to enhancing necrotic portions of tumor and peritumoral edema were drawn, and quantitative image features were derived from these ROIs. Robust quantitative image features were defined on the basis of an intraclass correlation coefficient of 0.6 for a digital algorithmic modification and a test-retest analysis. The robust features were visualized by using hierarchic clustering and were correlated with survival by using Cox proportional hazards modeling. Next, these robust image features were correlated with manual radiologist annotations from the Visually Accessible Rembrandt Images (VASARI) feature set and GBM molecular subgroups by using nonparametric statistical tests. A bioinformatic algorithm was used to create gene expression modules, defined as a set of coexpressed genes together with a multivariate model of cancer driver genes predictive of the module's expression pattern. Modules were correlated with robust image features by using the Spearman correlation test to create radiogenomic maps and to link robust image features with molecular pathways.Eighteen image features passed the robustness analysis and were further analyzed for the three types of ROIs, for a total of 54 image features. Three enhancement features were significantly correlated with survival, 77 significant correlations were found between robust quantitative features and the VASARI feature set, and seven image features were correlated with molecular subgroups (P < .05 for all). A radiogenomics map was created to link image features with gene expression modules and allowed linkage of 56% (30 of 54) of the image features with biologic processes.Radiogenomic approaches in GBM have the potential to predict clinical and molecular characteristics of tumors noninvasively. Online supplemental material is available for this article.

    View details for DOI 10.1148/radiol.14131731

    View details for PubMedID 24827998

  • Oncogenic transformation of diverse gastrointestinal tissues in primary organoid culture NATURE MEDICINE Li, X., Nadauld, L., Ootani, A., Corney, D. C., Pai, R. K., Gevaert, O., Cantrell, M. A., Rack, P. G., Neal, J. T., Chan, C. W., Yeung, T., Gong, X., Yuan, J., Wilhelmy, J., Robine, S., Attardi, L. D., Plevritis, S. K., Hung, K. E., Chen, C., Ji, H. P., Kuo, C. J. 2014; 20 (7): 769-777

    Abstract

    The application of primary organoid cultures containing epithelial and mesenchymal elements to cancer modeling holds promise for combining the accurate multilineage differentiation and physiology of in vivo systems with the facile in vitro manipulation of transformed cell lines. Here we used a single air-liquid interface culture method without modification to engineer oncogenic mutations into primary epithelial and mesenchymal organoids from mouse colon, stomach and pancreas. Pancreatic and gastric organoids exhibited dysplasia as a result of expression of Kras carrying the G12D mutation (Kras(G12D)), p53 loss or both and readily generated adenocarcinoma after in vivo transplantation. In contrast, primary colon organoids required combinatorial Apc, p53, Kras(G12D) and Smad4 mutations for progressive transformation to invasive adenocarcinoma-like histology in vitro and tumorigenicity in vivo, recapitulating multi-hit models of colorectal cancer (CRC), as compared to the more promiscuous transformation of small intestinal organoids. Colon organoid culture functionally validated the microRNA miR-483 as a dominant driver oncogene at the IGF2 (insulin-like growth factor-2) 11p15.5 CRC amplicon, inducing dysplasia in vitro and tumorigenicity in vivo. These studies demonstrate the general utility of a highly tractable primary organoid system for cancer modeling and driver oncogene validation in diverse gastrointestinal tissues.

    View details for DOI 10.1038/nm.3585

    View details for Web of Science ID 000338689500021

  • Oncogenic transformation of diverse gastrointestinal tissues in primary organoid culture. Nature medicine Li, X., Nadauld, L., Ootani, A., Corney, D. C., Pai, R. K., Gevaert, O., Cantrell, M. A., Rack, P. G., Neal, J. T., Chan, C. W., Yeung, T., Gong, X., Yuan, J., Wilhelmy, J., Robine, S., Attardi, L. D., Plevritis, S. K., Hung, K. E., Chen, C. Z., Ji, H. P., Kuo, C. J. 2014

    Abstract

    The application of primary organoid cultures containing epithelial and mesenchymal elements to cancer modeling holds promise for combining the accurate multilineage differentiation and physiology of in vivo systems with the facile in vitro manipulation of transformed cell lines. Here we used a single air-liquid interface culture method without modification to engineer oncogenic mutations into primary epithelial and mesenchymal organoids from mouse colon, stomach and pancreas. Pancreatic and gastric organoids exhibited dysplasia as a result of expression of Kras carrying the G12D mutation (Kras(G12D)), p53 loss or both and readily generated adenocarcinoma after in vivo transplantation. In contrast, primary colon organoids required combinatorial Apc, p53, Kras(G12D) and Smad4 mutations for progressive transformation to invasive adenocarcinoma-like histology in vitro and tumorigenicity in vivo, recapitulating multi-hit models of colorectal cancer (CRC), as compared to the more promiscuous transformation of small intestinal organoids. Colon organoid culture functionally validated the microRNA miR-483 as a dominant driver oncogene at the IGF2 (insulin-like growth factor-2) 11p15.5 CRC amplicon, inducing dysplasia in vitro and tumorigenicity in vivo. These studies demonstrate the general utility of a highly tractable primary organoid system for cancer modeling and driver oncogene validation in diverse gastrointestinal tissues.

    View details for DOI 10.1038/nm.3585

    View details for PubMedID 24859528

  • NF-?B protein expression associates with (18)F-FDG PET tumor uptake in non-small cell lung cancer: A radiogenomics validation study to understand tumor metabolism. Lung cancer Nair, V. S., Gevaert, O., Davidzon, G., Plevritis, S. K., West, R. 2014; 83 (2): 189-196

    Abstract

    We previously demonstrated that NF-κB may be associated with (18)F-FDG PET uptake and patient prognosis using radiogenomics in patients with non-small cell lung cancer (NSCLC). To validate these results, we assessed NF-κB protein expression in an extended cohort of NSCLC patients.We examined NF-κBp65 by immunohistochemistry (IHC) using a Tissue Microarray. Staining intensity was assessed by qualitative ordinal scoring and compared to tumor FDG uptake (SUVmax and SUVmean), lactate dehydrogenase A (LDHA) expression (as a positive control) and outcome using ANOVA, Kaplan Meier (KM), and Cox-proportional hazards (CPH) analysis.365 tumors from 355 patients with long-term follow-up were analyzed. The average age for patients was 67±11 years, 46% were male and 67% were ever smokers. Stage I and II patients comprised 83% of the cohort and the majority had adenocarcinoma (73%). From 88 FDG PET scans available, average SUVmax and SUVmean were 8.3±6.6, and 3.7±2.4 respectively. Increasing NF-κBp65 expression, but not LDHA expression, was associated with higher SUVmax and SUVmean (p=0.03 and 0.02 respectively). Both NF-κBp65 and positive FDG uptake were significantly associated with more advanced stage, tumor histology and invasion. Higher NF-κBp65 expression was associated with death by KM analysis (p=0.06) while LDHA was strongly associated with recurrence (p=0.04). Increased levels of combined NF-κBp65 and LDHA expression were synergistic and associated with both recurrence (p=0.04) and death (p=0.03).NF-κB IHC was a modest biomarker of prognosis that associated with tumor glucose metabolism on FDG PET when compared to existing molecular correlates like LDHA, which was synergistic with NF-κB for outcome. These findings recapitulate radiogenomics profiles previously reported by our group and provide a methodology for studying tumor biology using computational approaches.

    View details for DOI 10.1016/j.lungcan.2013.11.001

    View details for PubMedID 24355259

  • Stromal architecture and periductal decorin are potential prognostic markers for ipsilateral locoregional recurrence in ductal carcinoma in situ of the breast HISTOPATHOLOGY Van Bockstal, M., Lambein, K., Gevaert, O., de Wever, O., Praet, M., Cocquyt, V., Van den Broecke, R., Braems, G., Denys, H., Libbrecht, L. 2013; 63 (4): 520-533

    View details for DOI 10.1111/his.12188

    View details for Web of Science ID 000325088600008

  • Identification of ovarian cancer driver genes by using module network integration of multi-omics data INTERFACE FOCUS Gevaert, O., Villalobos, V., Sikic, B. I., Plevritis, S. K. 2013; 3 (4)
  • Identification of ovarian cancer driver genes by using module network integration of multi-omics data. Interface focus Gevaert, O., Villalobos, V., Sikic, B. I., Plevritis, S. K. 2013; 3 (4): 20130013-?

    Abstract

    The increasing availability of multi-omics cancer datasets has created a new opportunity for data integration that promises a more comprehensive understanding of cancer. The challenge is to develop mathematical methods that allow the integration and extraction of knowledge from large datasets such as The Cancer Genome Atlas (TCGA). This has led to the development of a variety of omics profiles that are highly correlated with each other; however, it remains unknown which profile is the most meaningful and how to efficiently integrate different omics profiles. We developed AMARETTO, an algorithm to identify cancer drivers by integrating a variety of omics data from cancer and normal tissue. AMARETTO first models the effects of genomic/epigenomic data on disease-specific gene expression. AMARETTO's second step involves constructing a module network to connect the cancer drivers with their downstream targets. We observed that more gene expression variation can be explained when using disease-specific gene expression data. We applied AMARETTO to the ovarian cancer TCGA data and identified several cancer driver genes of interest, including novel genes in addition to known drivers of cancer. Finally, we showed that certain modules are predictive of good versus poor outcome, and the associated drivers were related to DNA repair pathways.

    View details for DOI 10.1098/rsfs.2013.0013

    View details for PubMedID 24511378

  • Cross-Species Functional Analysis of Cancer-Associated Fibroblasts Identifies a Critical Role for CLCF1 and IL-6 in Non-Small Cell Lung Cancer In Vivo CANCER RESEARCH Vicent, S., Sayles, L. C., Vaka, D., Khatri, P., Gevaert, O., Chen, R., Zheng, Y., Gillespie, A. K., Clarke, N., Xu, Y., Shrager, J., Hoang, C. D., Plevritis, S., Butte, A. J., Sweet-Cordero, E. A. 2012; 72 (22): 5744-5756

    Abstract

    Cancer-associated fibroblasts (CAF) have been reported to support tumor progression by a variety of mechanisms. However, their role in the progression of non-small cell lung cancer (NSCLC) remains poorly defined. In addition, the extent to which specific proteins secreted by CAFs contribute directly to tumor growth is unclear. To study the role of CAFs in NSCLCs, a cross-species functional characterization of mouse and human lung CAFs was conducted. CAFs supported the growth of lung cancer cells in vivo by secretion of soluble factors that directly stimulate the growth of tumor cells. Gene expression analysis comparing normal mouse lung fibroblasts and mouse lung CAFs identified multiple genes that correlate with the CAF phenotype. A gene signature of secreted genes upregulated in CAFs was an independent marker of poor survival in patients with NSCLC. This secreted gene signature was upregulated in normal lung fibroblasts after long-term exposure to tumor cells, showing that lung fibroblasts are "educated" by tumor cells to acquire a CAF-like phenotype. Functional studies identified important roles for CLCF1-CNTFR and interleukin (IL)-6-IL-6R signaling in promoting growth of NSCLCs. This study identifies novel soluble factors contributing to the CAF protumorigenic phenotype in NSCLCs and suggests new avenues for the development of therapeutic strategies.

    View details for DOI 10.1158/0008-5472.CAN-12-1097

    View details for PubMedID 22962265

  • Evaluation of a panel of 28 biomarkers for the non-invasive diagnosis of endometriosis HUMAN REPRODUCTION Vodolazkaia, A., El-Aalamat, Y., Popovic, D., Mihalyi, A., Bossuyt, X., Kyama, C. M., Fassbender, A., Bokor, A., SCHOLS, D., Huskens, D., Meuleman, C., Peeraer, K., Tomassetti, C., Gevaert, O., Waelkens, E., Kasran, A., De Moor, B., D'Hooghe, T. M. 2012; 27 (9): 2698-2711

    Abstract

    At present, the only way to conclusively diagnose endometriosis is laparoscopic inspection, preferably with histological confirmation. This contributes to the delay in the diagnosis of endometriosis which is 6-11 years. So far non-invasive diagnostic approaches such as ultrasound (US), MRI or blood tests do not have sufficient diagnostic power. Our aim was to develop and validate a non-invasive diagnostic test with a high sensitivity (80% or more) for symptomatic endometriosis patients, without US evidence of endometriosis, since this is the group most in need of a non-invasive test.A total of 28 inflammatory and non-inflammatory plasma biomarkers were measured in 353 EDTA plasma samples collected at surgery from 121 controls without endometriosis at laparoscopy and from 232 women with endometriosis (minimal-mild n = 148; moderate-severe n = 84), including 175 women without preoperative US evidence of endometriosis. Surgery was done during menstrual (n = 83), follicular (n = 135) and luteal (n = 135) phases of the menstrual cycle. For analysis, the data were randomly divided into an independent training (n = 235) and a test (n = 118) data set. Statistical analysis was done using univariate and multivariate (logistic regression and least squares support vector machines (LS-SVM) approaches in training- and test data set separately to validate our findings.In the training set, two models of four biomarkers (Model 1: annexin V, VEGF, CA-125 and glycodelin; Model 2: annexin V, VEGF, CA-125 and sICAM-1) analysed in plasma, obtained during the menstrual phase, could predict US-negative endometriosis with a high sensitivity (81-90%) and an acceptable specificity (68-81%). The same two models predicted US-negative endometriosis in the independent validation test set with a high sensitivity (82%) and an acceptable specificity (63-75%).In plasma samples obtained during menstruation, multivariate analysis of four biomarkers (annexin V, VEGF, CA-125 and sICAM-1/or glycodelin) enabled the diagnosis of endometriosis undetectable by US with a sensitivity of 81-90% and a specificity of 63-81% in independent training- and test data set. The next step is to apply these models for preoperative prediction of endometriosis in an independent set of patients with infertility and/or pain without US evidence of endometriosis, scheduled for laparoscopy.

    View details for DOI 10.1093/humrep/des234

    View details for Web of Science ID 000307502000016

    View details for PubMedID 22736326

  • Prognostic PET F-18-FDG Uptake Imaging Features Are Associated with Major Oncogenomic Alterations in Patients with Resected Non-Small Cell Lung Cancer CANCER RESEARCH Nair, V. S., Gevaert, O., Davidzon, G., Napel, S., Graves, E. E., Hoang, C. D., Shrager, J. B., Quon, A., Rubin, D. L., Plevritis, S. K. 2012; 72 (15): 3725-3734

    Abstract

    Although 2[18F]fluoro-2-deoxy-d-glucose (FDG) uptake during positron emission tomography (PET) predicts post-surgical outcome in patients with non-small cell lung cancer (NSCLC), the biologic basis for this observation is not fully understood. Here, we analyzed 25 tumors from patients with NSCLCs to identify tumor PET-FDG uptake features associated with gene expression signatures and survival. Fourteen quantitative PET imaging features describing FDG uptake were correlated with gene expression for single genes and coexpressed gene clusters (metagenes). For each FDG uptake feature, an associated metagene signature was derived, and a prognostic model was identified in an external cohort and then tested in a validation cohort of patients with NSCLC. Four of eight single genes associated with FDG uptake (LY6E, RNF149, MCM6, and FAP) were also associated with survival. The most prognostic metagene signature was associated with a multivariate FDG uptake feature [maximum standard uptake value (SUV(max)), SUV(variance), and SUV(PCA2)], each highly associated with survival in the external [HR, 5.87; confidence interval (CI), 2.49-13.8] and validation (HR, 6.12; CI, 1.08-34.8) cohorts, respectively. Cell-cycle, proliferation, death, and self-recognition pathways were altered in this radiogenomic profile. Together, our findings suggest that leveraging tumor genomics with an expanded collection of PET-FDG imaging features may enhance our understanding of FDG uptake as an imaging biomarker beyond its association with glycolysis.

    View details for DOI 10.1158/0008-5472.CAN-11-3943

    View details for PubMedID 22710433

  • Combined mRNA microarray and proteomic analysis of eutopic endometrium of women with and without endometriosis HUMAN REPRODUCTION Fassbender, A., Verbeeck, N., Boernigen, D., Kyama, C. M., Bokor, A., Vodolazkaia, A., Peeraer, K., Tomassetti, C., Meuleman, C., Gevaert, O., Van de Plas, R., Ojeda, F., De Moor, B., Moreau, Y., Waelkens, E., D'Hooghe, T. M. 2012; 27 (7): 2020-2029
  • Combined mRNA microarray and proteomic analysis of eutopic endometrium of women with and without endometriosis. Human reproduction (Oxford, England) Fassbender, A., Verbeeck, N., Börnigen, D., Kyama, C. M., Bokor, A., Vodolazkaia, A., Peeraer, K., Tomassetti, C., Meuleman, C., Gevaert, O., Van de Plas, R., Ojeda, F., De Moor, B., Moreau, Y., Waelkens, E., D'Hooghe, T. M. 2012; 27 (7): 2020-2029

    Abstract

    An early semi-invasive diagnosis of endometriosis has the potential to allow early treatment and minimize disease progression but no such test is available at present. Our aim was to perform a combined mRNA microarray and proteomic analysis on the same eutopic endometrium sample obtained from patients with and without endometriosis.mRNA and protein fractions were extracted from 49 endometrial biopsies obtained from women with laparoscopically proven presence (n= 31) or absence (n= 18) of endometriosis during the early luteal (n= 27) or menstrual phase (n= 22) and analyzed using microarray and proteomic surface enhanced laser desorption ionization-time of flight mass spectrometry, respectively. Proteomic data were analyzed using a least squares-support vector machines (LS-SVM) model built on 70% (training set) and 30% of the samples (test set).mRNA analysis of eutopic endometrium did not show any differentially expressed genes in women with endometriosis when compared with controls, regardless of endometriosis stage or cycle phase. mRNA was differentially expressed (P< 0.05) in women with (925 genes) and without endometriosis (1087 genes) during the menstrual phase when compared with the early luteal phase. Proteomic analysis based on five peptide peaks [2072 mass/charge (m/z); 2973 m/z; 3623 m/z; 3680 m/z and 21133 m/z] using an LS-SVM model applied on the luteal phase endometrium training set allowed the diagnosis of endometriosis (sensitivity, 91; 95% confidence interval (CI): 74-98; specificity, 80; 95% CI: 66-97 and positive predictive value, 87.9%; negative predictive value, 84.8%) in the test set.mRNA expression of eutopic endometrium was comparable in women with and without endometriosis but different in menstrual endometrium when compared with luteal endometrium in women with endometriosis. Proteomic analysis of luteal phase endometrium allowed the diagnosis of endometriosis with high sensitivity and specificity in training and test sets. A potential limitation of our study is the fact that our control group included women with a normal pelvis as well as women with concurrent pelvic disease (e.g. fibroids, benign ovarian cysts, hydrosalpinges), which may have contributed to the comparable mRNA expression profile in the eutopic endometrium of women with endometriosis and controls.

    View details for DOI 10.1093/humrep/des127

    View details for PubMedID 22556377

  • Proteomics Analysis of Plasma for Early Diagnosis of Endometriosis OBSTETRICS AND GYNECOLOGY Fassbender, A., Waelkens, E., Verbeeck, N., Kyama, C. M., Bokor, A., Vodolazkaia, A., Van De Plas, R., Meuleman, C., Peeraer, K., Tomassetti, C., Gevaert, O., Ojeda, F., De Moor, B., D'Hooghe, T. 2012; 119 (2): 276-285

    Abstract

    To test the hypothesis that differential surface-enhanced laser desorption/ionization time-of-flight mass spectrometry protein or peptide expression in plasma can be used in infertile women with or without pelvic pain to predict the presence of laparoscopically and histologically confirmed endometriosis, especially in the subpopulation with a normal preoperative gynecologic ultrasound examination.Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry analysis was performed on 254 plasma samples obtained from 89 women without endometriosis and 165 women with endometriosis (histologically confirmed) undergoing laparoscopies for infertility with or without pelvic pain. Data were analyzed using least squares support vector machines and were divided randomly (100 times) into a training data set (70%) and a test data set (30%).Minimal-to-mild endometriosis was best predicted (sensitivity 75%, 95% confidence interval [CI] 63-89; specificity 86%, 95% CI 71-94; positive predictive value 83.6%, negative predictive value 78.3%) using a model based on five peptide and protein peaks (range 4.898-14.698 m/z) in menstrual phase samples. Moderate-to-severe endometriosis was best predicted (sensitivity 98%, 95% CI 84-100; specificity 81%, 95% CI 67-92; positive predictive value 74.4%, negative predictive value 98.6%) using a model based on five other peptide and protein peaks (range 2.189-7.457 m/z) in luteal phase samples. The peak with the highest intensity (2.189 m/z) was identified as a fibrinogen β-chain peptide. Ultrasonography-negative endometriosis was best predicted (sensitivity 88%, 95% CI 73-100; specificity 84%, 95% CI 71-96) using a model based on five peptide peaks (range 2.058-42.065 m/z) in menstrual phase samples.A noninvasive test using proteomic analysis of plasma samples obtained during the menstrual phase enabled the diagnosis of endometriosis undetectable by ultrasonography with high sensitivity and specificity.II.

    View details for DOI 10.1097/AOG.0b013e31823fda8d

    View details for Web of Science ID 000299604300012

    View details for PubMedID 22270279

  • Atypical Neurofibromas in Neurofibromatosis Type 1 are Premalignant Tumors GENES CHROMOSOMES & CANCER Beert, E., Brems, H., Daniels, B., De Wever, I., Van Calenbergh, F., Schoenaers, J., Debiec-Rychter, M., Gevaert, O., De Raedt, T., Van den Bruel, A., de Ravel, T., Cichowski, K., Kluwe, L., Mautner, V., Sciot, R., Legius, E. 2011; 50 (12): 1021-1032

    Abstract

    Benign peripheral nerve sheath tumors (PNSTs) are a characteristic feature of neurofibromatosis type I (NF1) patients. NF1 individuals have an 8-13% lifetime risk of developing a malignant PNST (MPNST). Atypical neurofibromas are symptomatic, hypercellular PNSTs, composed of cells with hyperchromatic nuclei in the absence of mitoses. Little is known about the origin and nature of atypical neurofibromas in NF1 patients. In this study, we classified the atypical neurofibromas in the spectrum of NF1-associated PNSTs by analyzing 65 tumor samples from 48 NF1 patients. We compared tumor-specific chromosomal copy number alterations between benign neurofibromas, atypical neurofibromas, and MPNSTs (low-, intermediate-, and high-grade) by karyotyping and microarray-based comparative genome hybridization (aCGH). In 15 benign neurofibromas (4 subcutaneous and 11 plexiform), no copy number alterations were found, except a single event in a plexiform neurofibroma. One highly significant recurrent aberration (15/16) was identified in the atypical neurofibromas, namely a deletion with a minimal overlapping region (MOR) in chromosome band 9p21.3, including CDKN2A and CDKN2B. Copy number loss of the CDKN2A/B gene locus was one of the most common events in the group of MPNSTs, with deletions in low-, intermediate-, and high-grade MPNSTs. In one tumor, we observed a clear transition from a benign-atypical neurofibroma toward an intermediate-grade MPNST, confirmed by both histopathology and aCGH analysis. These data support the hypothesis that atypical neurofibromas are premalignant tumors, with the CDKN2A/B deletion as the first step in the progression toward MPNST.

    View details for DOI 10.1002/gcc.20921

    View details for Web of Science ID 000296443600005

    View details for PubMedID 21987445

  • Prediction of lymph node involvement in breast cancer from primary tumor tissue using gene expression profiling and miRNAs BREAST CANCER RESEARCH AND TREATMENT Smeets, A., Daemen, A., Vanden Bempt, I., Gevaert, O., Claes, B., Wildiers, H., Drijkoningen, R., Van Hummelen, P., Lambrechts, D., De Moor, B., Neven, P., Sotiriou, C., Vandorpe, T., Paridaens, R., Christiaens, M. R. 2011; 129 (3): 767-776

    Abstract

    The aim of this study was to investigate whether lymph node involvement in breast cancer is influenced by gene or miRNA expression of the primary tumor. For this purpose, we selected a very homogeneous patient population to minimize heterogeneity in other tumor and patient characteristics. First, we compared gene expression profiles of primary tumor tissue from a group of 96 breast cancer patients balanced for lymph node involvement using Affymetrix Human U133 Plus 2.0 microarray chip. A model was built by weighted Least-Squares Support Vector Machines and validated on an internal and external dataset. Next, miRNA profiling was performed on a subset of 82 tumors using Human MiRNA-microarray chips (Illumina). Finally, for each miRNA the number of significant inverse correlated targets was determined and compared with 1000 sets of randomly chosen targets. A model based on 241 genes was built (AUC 0.66). The AUC for the internal dataset was 0.646 and 0. 651 for the external datasets. The model includes multiple kinases, apoptosis-related, and zinc ion-binding genes. Integration of the microarray and miRNA data reveals ten miRNAs suppressing lymph node invasion and one miRNA promoting lymph node invasion. Our results provide evidence that measurable differences in gene and miRNA expression exist between node negative and node positive patients and thus that lymph node involvement is not a genetically random process. Moreover, our data suggest a general deregulation of the miRNA machinery that is potentially responsible for lymph node invasion.

    View details for DOI 10.1007/s10549-010-1265-5

    View details for Web of Science ID 000294680600010

    View details for PubMedID 21116709

  • Ectopic pregnancy: using the hCG ratio to select women for expectant or medical management ACTA OBSTETRICIA ET GYNECOLOGICA SCANDINAVICA Kirk, E., Van Calster, B., Condous, G., Papageorghiou, A. T., Gevaert, O., Van Huffel, S., De Moor, B., Timmerman, D., Bourne, T. 2011; 90 (3): 264-272

    Abstract

    To identify variables that can be used to select women with an ectopic pregnancy for expectant or medical management with systemic methotrexate.Cohort study.Early Pregnancy Unit of a London teaching hospital.Women with a tubal ectopic pregnancy managed non-surgically.The diagnosis of tubal ectopic pregnancy was made using transvaginal sonography. Human chorionic gonadotrophin (hCG) levels had to be taken at 0 hour and 48 hours pre-treatment. Other recorded variables include presenting complaints, gestational age, progesterone levels, size of the ectopic mass and appearance of the ectopic on transvaginal sonography. Women were followed up until the outcome (success or failure) of management was known.Univariable analysis was performed to identify the variables associated with successful management using area under curves and relative risks.Thirty-nine women underwent expectant management (overall success rate 71.8%) and 42 had medical management (overall success rate 76.2%). The pre-treatment hCG ratio (hCG 48 hours/hCG 0 hour) was related to the failure of both expectant (area under curve 0.86, 95% CI 0.67-0.94) and medical (area under curve 0.79, 95% CI 0.58-0.90) management. History of ectopic pregnancy was related to failure of expectant management only (relative risk 0.46, 95% CI 0.16-0.92).The most important variable for predicting the likelihood of successful non-surgical management was the pre-treatment hCG ratio. New studies are required to validate the use of this variable and of history of ectopic pregnancy to predict the likelihood of successful non-surgical management in clinical practice.

    View details for DOI 10.1111/j.1600-0412.2010.01053.x

    View details for Web of Science ID 000288825600010

    View details for PubMedID 21306315

  • Evaluation of endometrial biomarkers for semi-invasive diagnosis of endometriosis FERTILITY AND STERILITY Kyama, C. M., Mihalyi, A., Gevaert, O., Waelkens, E., Simsa, P., Van De Plas, R., Meuleman, C., De Moor, B., D'Hooghe, T. M. 2011; 95 (4): 1338-U173

    Abstract

    To test the hypothesis that specific proteins and peptides are expressed differentially in eutopic endometrium of women with and without endometriosis and at specific stages of the disease (minimal, mild, moderate, or severe) during the secretory phase.Patients with endometriosis were compared with controls.University hospital.A total of 29 patients during the secretory phase were selected for this study on the basis of cycle phase and presence or absence of endometriosis.Endometriosis was confirmed laparoscopically and histologically in 19 patients with endometriosis of revised American Society for Reproductive Medicine stages (9 minimal-mild and 10 moderate-severe), and the presence of a normal pelvis was documented by laparoscopy in 10 controls.Protein expression of endometrium was evaluated with use of surface-enhanced laser desorption/ionization time-of-flight mass spectrometry. The differential expression of protein mass peaks was analyzed with use of support vector machine algorithms and logistic regression models.Data preprocessing resulted in differential expression of 73, 30, and 131 mass peaks between controls and patients with endometriosis (all stages), with minimal-mild endometriosis, and with moderate-severe endometriosis, respectively. Endometriosis was diagnosed with high sensitivity (89.5%) and specificity (90%) with use of five down-regulated mass peaks (1.949 kDa, 5.183 kDa, 8.650 kDa, 8.659 kDa, and 13.910 kDa) obtained after support vector machine ranking and logistic regression classification. With use of a similar analysis, minimal-mild endometriosis was diagnosed with four mass peaks (two up-regulated: 35.956 kDa and 90.675 kDa and two down-regulated: 1.924 kDa and 2.504 kDa) with maximal sensitivity (100%) and specificity (100%). The 90.675-kDa and 35.956-kDa mass peaks were identified as T-plastin and annexin V, respectively.Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry analysis of secretory phase endometrium combined with bioinformatics puts forward a prospective panel of potential biomarkers with sensitivity of 100% and specificity of 100% for the diagnosis of minimal to mild endometriosis.

    View details for DOI 10.1016/j.fertnstert.2010.06.084

    View details for Web of Science ID 000288010900024

    View details for PubMedID 20800833

  • TRIzol treatment of secretory phase endometrium allows combined proteomic and mRNA microarray analysis of the same sample in women with and without endometriosis REPRODUCTIVE BIOLOGY AND ENDOCRINOLOGY Fassbender, A., Simsa, P., Kyama, C. M., Waelkens, E., Mihalyi, A., Meuleman, C., Gevaert, O., Van De Plas, R., De Moor, B., D'Hooghe, T. M. 2010; 8

    Abstract

    According to mRNA microarray, proteomics and other studies, biological abnormalities of eutopic endometrium (EM) are involved in the pathogenesis of endometriosis, but the relationship between mRNA and protein expression in EM is not clear. We tested for the first time the hypothesis that EM TRIzol extraction allows proteomic Surface Enhanced Laser Desorption/Ionisation Time-of-Flight Mass Spectrometry (SELDI-TOF MS) analysis and that these proteomic data can be related to mRNA (microarray) data obtained from the same EM sample from women with and without endometriosis.Proteomic analysis was performed using SELDI-TOF-MS of TRIzol-extracted EM obtained during secretory phase from patients without endometriosis (n = 6), patients with minimal-mild (n = 5) and with moderate-severe endometriosis (n = 5), classified according to the system of the American Society of Reproductive Medicine. Proteomic data were compared to mRNA microarray data obtained from the same EM samples.In our SELDI-TOF MS study 32 peaks were differentially expressed in endometrium of all women with endometriosis (stages I-IV) compared with all controls during the secretory phase. Comparison of proteomic results with those from microarray revealed no corresponding genes/proteins.TRIzol treatment of secretory phase EM allows combined proteomic and mRNA microarray analysis of the same sample, but comparison between proteomic and microarray data was not evident, probably due to post-translational modifications.

    View details for DOI 10.1186/1477-7827-8-123

    View details for Web of Science ID 000284485100001

    View details for PubMedID 20964823

  • A Seven-Gene Set Associated with Chronic Hypoxia of Prognostic Importance in Hepatocellular Carcinoma CLINICAL CANCER RESEARCH Van Malenstein, H., Gevaert, O., Libbrecht, L., Daemen, A., Allemeersch, J., Nevens, F., Van Cutsem, E., Cassiman, D., De Moor, B., Verslype, C., van Pelt, J. 2010; 16 (16): 4278-4288

    Abstract

    Hepatocellular carcinomas (HCC) have an unpredictable clinical course, and molecular classification could provide better insights into prognosis and patient-directed therapy. We hypothesized that in HCC, certain microenvironmental regions exist with a characteristic gene expression related to chronic hypoxia which would induce aggressive behavior.We determined the gene expression pattern for human HepG2 liver cells under chronic hypoxia by microarray analysis. Differentially expressed genes were selected and their clinical values were assessed. In our hypothesis-driven analysis, we included available independent microarray studies of patients with HCC in one single analysis. Three microarray studies encompassing 272 patients were used as training sets to determine a minimal prognostic gene set, and one recent study of 91 patients was used for validation.Using computational methods, we identified seven genes (out of 3,592 differentially expressed under chronic hypoxia) that showed correlation with poor prognostic indicators in all three training sets (65/139/73 patients) and this was validated in a fourth data set (91 patients). Retrospectively, the seven-gene set was associated with poor survival (hazard ratio, 1.39; P = 0.007) and early recurrence (hazard ratio, 2.92; P = 0.007) in 135 patients. Moreover, using a hypoxia score based on this seven-gene set, we found that patients with a score of >0.35 (n = 42) had a median survival of 307 days, whereas patients with a score of < or =0.35 (n = 93) had a median survival of 1,602 days (P = 0.005).We identified a unique, liver-specific, seven-gene signature associated with chronic hypoxia that correlates with poor prognosis in HCCs.

    View details for DOI 10.1158/1078-0432.CCR-09-3274

    View details for Web of Science ID 000280830300024

    View details for PubMedID 20592013

  • Improved Microarray-Based Decision Support with Graph Encoded Interactome Data PLOS ONE Daemen, A., Signoretto, M., Gevaert, O., Suykens, J. A., De Moor, B. 2010; 5 (4)

    Abstract

    In the past, microarray studies have been criticized due to noise and the limited overlap between gene signatures. Prior biological knowledge should therefore be incorporated as side information in models based on gene expression data to improve the accuracy of diagnosis and prognosis in cancer. As prior knowledge, we investigated interaction and pathway information from the human interactome on different aspects of biological systems. By exploiting the properties of kernel methods, relations between genes with similar functions but active in alternative pathways could be incorporated in a support vector machine classifier based on spectral graph theory. Using 10 microarray data sets, we first reduced the number of data sources relevant for multiple cancer types and outcomes. Three sources on metabolic pathway information (KEGG), protein-protein interactions (OPHID) and miRNA-gene targeting (microRNA.org) outperformed the other sources with regard to the considered class of models. Both fixed and adaptive approaches were subsequently considered to combine the three corresponding classifiers. Averaging the predictions of these classifiers performed best and was significantly better than the model based on microarray data only. These results were confirmed on 6 validation microarray sets, with a significantly improved performance in 4 of them. Integrating interactome data thus improves classification of cancer outcome for the investigated microarray technologies and cancer types. Moreover, this strategy can be incorporated in any kernel method or non-linear version of a non-kernel method.

    View details for DOI 10.1371/journal.pone.0010225

    View details for Web of Science ID 000276853800015

    View details for PubMedID 20419106

  • Non-invasive diagnosis of endometriosis based on a combined analysis of six plasma biomarkers HUMAN REPRODUCTION Mihalyi, A., Gevaert, O., Kyama, C. M., Simsa, P., Pochet, N., De Smet, F., De Moor, B., Meuleman, C., Billen, J., Blanckaert, N., Vodolazkaia, A., Fulop, V., D'Hooghe, T. M. 2010; 25 (3): 654-664

    Abstract

    Lack of a non-invasive diagnostic test contributes to the long delay between onset of symptoms and diagnosis of endometriosis. The aim of this study was to evaluate the combined performance of six potential plasma biomarkers in the diagnosis of endometriosis.This case-control study was conducted in 294 infertile women, consisting of 93 women with a normal pelvis and 201 women with endometriosis. We measured plasma concentrations of interleukin (IL)-6, IL-8, tumour necrosis factor-alpha, high-sensitivity C-reactive protein (hsCRP), and cancer antigens CA-125 and CA-19-9. Analyses were done using the Kruskal-Wallis test, Mann-Whitney test, receiver operator characteristic, stepwise logistic regression and least squares support vector machines (LSSVM).Plasma levels of IL-6, IL-8 and CA-125 were increased in all women with endometriosis and in those with minimal-mild endometriosis, compared with controls. In women with moderate-severe endometriosis, plasma levels of IL-6, IL-8 and CA-125, but also of hsCRP, were significantly higher than in controls. Using stepwise logistic regression, moderate-severe endometriosis was diagnosed with a sensitivity of 100% (specificity 84%) and minimal-mild endometriosis was detected with a sensitivity of 87% (specificity 71%) during the secretory phase. Using LSSVM analysis, minimal-mild endometriosis was diagnosed with a sensitivity of 94% (specificity 61%) during the secretory phase and with a sensitivity of 92% (specificity 63%) during the menstrual phase.Advanced statistical analysis of a panel of six selected plasma biomarkers on samples obtained during the secretory phase or during menstruation allows the diagnosis of both minimal-mild and moderate-severe endometriosis with high sensitivity and clinically acceptable specificity.

    View details for DOI 10.1093/humrep/dep425

    View details for Web of Science ID 000274490700014

    View details for PubMedID 20007161

  • A taxonomy of epithelial human cancer and their metastases BMC MEDICAL GENOMICS Gevaert, O., Daemen, A., De Moor, B., Libbrecht, L. 2009; 2

    Abstract

    Microarray technology has allowed to molecularly characterize many different cancer sites. This technology has the potential to individualize therapy and to discover new drug targets. However, due to technological differences and issues in standardized sample collection no study has evaluated the molecular profile of epithelial human cancer in a large number of samples and tissues. Additionally, it has not yet been extensively investigated whether metastases resemble their tissue of origin or tissue of destination.We studied the expression profiles of a series of 1566 primary and 178 metastases by unsupervised hierarchical clustering. The clustering profile was subsequently investigated and correlated with clinico-pathological data. Statistical enrichment of clinico-pathological annotations of groups of samples was investigated using Fisher exact test. Gene set enrichment analysis (GSEA) and DAVID functional enrichment analysis were used to investigate the molecular pathways. Kaplan-Meier survival analysis and log-rank tests were used to investigate prognostic significance of gene signatures.Large clusters corresponding to breast, gastrointestinal, ovarian and kidney primary tissues emerged from the data. Chromophobe renal cell carcinoma clustered together with follicular differentiated thyroid carcinoma, which supports recent morphological descriptions of thyroid follicular carcinoma-like tumors in the kidney and suggests that they represent a subtype of chromophobe carcinoma. We also found an expression signature identifying primary tumors of squamous cell histology in multiple tissues. Next, a subset of ovarian tumors enriched with endometrioid histology clustered together with endometrium tumors, confirming that they share their etiopathogenesis, which strongly differs from serous ovarian tumors. In addition, the clustering of colon and breast tumors correlated with clinico-pathological characteristics. Moreover, a signature was developed based on our unsupervised clustering of breast tumors and this was predictive for disease-specific survival in three independent studies. Next, the metastases from ovarian, breast, lung and vulva cluster with their tissue of origin while metastases from colon showed a bimodal distribution. A significant part clusters with tissue of origin while the remaining tumors cluster with the tissue of destination.Our molecular taxonomy of epithelial human cancer indicates surprising correlations over tissues. This may have a significant impact on the classification of many cancer sites and may guide pathologists, both in research and daily practice. Moreover, these results based on unsupervised analysis yielded a signature predictive of clinical outcome in breast cancer. Additionally, we hypothesize that metastases from gastrointestinal origin either remember their tissue of origin or adapt to the tissue of destination. More specifically, colon metastases in the liver show strong evidence for such a bimodal tissue specific profile.

    View details for DOI 10.1186/1755-8794-2-69

    View details for Web of Science ID 000273595600001

    View details for PubMedID 20017941

  • Density of small diameter sensory nerve fibres in endometrium: a semi-invasive diagnostic test for minimal to mild endometriosis HUMAN REPRODUCTION Bokor, A., Kyama, C. M., Vercruysse, L., Fassbender, A., Gevaert, O., Vodolazkaia, A., De Moor, B., Fulop, V., D'Hooghe, T. 2009; 24 (12): 3025-3032

    Abstract

    The aim of our study was to test the hypothesis that multiple-sensory small-diameter nerve fibres are present in a higher density in endometrium from patients with endometriosis when compared with women with a normal pelvis, enabling the development of a semi-invasive diagnostic test for minimal-mild endometriosis.Secretory phase endometrium samples (n = 40), obtained from women with laparoscopically/histologically confirmed minimal-mild endometriosis (n = 20) and from women with a normal pelvis (n = 20) were selected from the biobank at the Leuven University Fertility Centre. Immunohistochemistry was performed to localize neural markers for sensory C, Adelta, adrenergic and cholinergic nerve fibres in the functional layer of the endometrium. Sections were immunostained with anti-human protein gene product 9.5 (PGP9.5), anti-neurofilament protein, anti-substance P (SP), anti-vasoactive intestinal peptide (VIP), anti-neuropeptide Y and anti-calcitonine gene-related polypeptide. Statistical analysis was done using the Mann-Whitney U-test, receiver operator characteristic analysis, stepwise logistic regression and least-squares support vector machines.The density of small nerve fibres was approximately 14 times higher in endometrium from patients with minimal-mild endometriosis (1.96 +/- 2.73) when compared with women with a normal pelvis (0.14 +/- 0.46, P < 0.0001).The combined analysis of neural markers PGP9.5, VIP and SP could predict the presence of minimal-mild endometriosis with 95% sensitivity, 100% specificity and 97.5% accuracy. To confirm our findings, prospective studies are required.

    View details for DOI 10.1093/humrep/dep283

    View details for Web of Science ID 000272069500009

    View details for PubMedID 19690351

  • Recurrent Copy Number Alterations in BRCA1-Mutated Ovarian Tumors Alter Biological Pathways HUMAN MUTATION Leunen, K., Gevaert, O., Daemen, A., Vanspauwen, V., Michils, G., De Moor, B., Moerman, P., Vergote, I., Legius, E. 2009; 30 (12): 1693-1702

    Abstract

    Array CGH was used to identify recurrent copy number alterations (RCNA) characteristic of either BRCA1-related or sporadic ovarian cancer. After preprocessing, both groups of patients were modeled using a recurrent Hidden Markov Model to detect RCNA. RCNA with a probability higher than 80% were called. After removing RCNA present in both groups, the genes present in the remaining RCNA were investigated for enrichment of pathways from external databases. More RCNA were observed in the BRCA1 group, and they display more losses than gains compared to the sporadic group. When focusing on the type of RCNA, no significant difference in length was seen for the gains, but there was a statistically significant difference for the losses. In the sporadic group, a great proportion of the altered regions contain genes known to have a function in cell adhesion and complement activation, whereas the BRCA1 samples are characterized by alterations in the HOX genes, metalloproteinases, tumor suppressor genes, and the estrogen-signaling pathways. We conclude that BRCA1 ovarian tumors present a different type, number, and length of RCNA; a huge amount of the genome is lost, resulting in important genomic instability. Moreover, important biological pathways are altered differentially when compared to the sporadic group.

    View details for DOI 10.1002/humu.21135

    View details for Web of Science ID 000272796400011

    View details for PubMedID 19802895

  • Intrinsic Gene Expression Profiles of Gliomas Are a Better Predictor of Survival than Histology CANCER RESEARCH Gravendeel, L. A., Kouwenhoven, M. C., Gevaert, O., de Rooi, J. J., Stubbs, A. P., Duijm, J. E., Daemen, A., Bleeker, F. E., Bralten, L. B., Kloosterhof, N. K., De Moor, B., Eilers, P. H., van der Spek, P. J., Kros, J. M., Smitt, P. A., van den Bent, M. J., French, P. J. 2009; 69 (23): 9065-9072

    Abstract

    Gliomas are the most common primary brain tumors with heterogeneous morphology and variable prognosis. Treatment decisions in patients rely mainly on histologic classification and clinical parameters. However, differences between histologic subclasses and grades are subtle, and classifying gliomas is subject to a large interobserver variability. To improve current classification standards, we have performed gene expression profiling on a large cohort of glioma samples of all histologic subtypes and grades. We identified seven distinct molecular subgroups that correlate with survival. These include two favorable prognostic subgroups (median survival, >4.7 years), two with intermediate prognosis (median survival, 1-4 years), two with poor prognosis (median survival, <1 year), and one control group. The intrinsic molecular subtypes of glioma are different from histologic subgroups and correlate better to patient survival. The prognostic value of molecular subgroups was validated on five independent sample cohorts (The Cancer Genome Atlas, Repository for Molecular Brain Neoplasia Data, GSE12907, GSE4271, and Li and colleagues). The power of intrinsic subtyping is shown by its ability to identify a subset of prognostically favorable tumors within an external data set that contains only histologically confirmed glioblastomas (GBM). Specific genetic changes (epidermal growth factor receptor amplification, IDH1 mutation, and 1p/19q loss of heterozygosity) segregate in distinct molecular subgroups. We identified a subgroup with molecular features associated with secondary GBM, suggesting that different genetic changes drive gene expression profiles. Finally, we assessed response to treatment in molecular subgroups. Our data provide compelling evidence that expression profiling is a more accurate and objective method to classify gliomas than histologic classification. Molecular classification therefore may aid diagnosis and can guide clinical decision making.

    View details for DOI 10.1158/0008-5472.CAN-09-2307

    View details for Web of Science ID 000272362800029

    View details for PubMedID 19920198

  • Molecular Response to Cetuximab and Efficacy of Preoperative Cetuximab-Based Chemoradiation in Rectal Cancer 44th Annual Meeting of the American-Society-of-Clinical-Oncology (ASCO) Debucquoy, A., Haustermans, K., Daemen, A., Aydin, S., Libbrecht, L., Gevaert, O., De Moor, B., Tejpar, S., McBride, W. H., Penninckx, F., Scalliet, P., Stroh, C., Vlassak, S., Sempoux, C., Machiels, J. AMER SOC CLINICAL ONCOLOGY. 2009: 2751–57

    Abstract

    To characterize the molecular pathways activated or inhibited by cetuximab when combined with chemoradiotherapy (CRT) in rectal cancer and to identify molecular profiles and biomarkers that might improve patient selection for such treatments.Forty-one patients with rectal cancer (T3-4 and/or N+) received preoperative radiotherapy (1.8 Gy, 5 days/wk, 45 Gy) in combination with capecitabine and cetuximab (400 mg/m2 as initial dose 1 week before CRT followed by 250 mg/m2 /wk for 5 weeks). Biopsies and plasma samples were taken before treatment, after cetuximab but before CRT, and at the time of surgery. Proteomics and microarrays were used to monitor the molecular response to cetuximab and to identify profiles and biomarkers to predict treatment efficacy.Cetuximab on its own downregulated genes involved in proliferation and invasion and upregulated inflammatory gene expression, with 16 genes being significantly influenced in microarray analysis. The decrease in proliferation was confirmed by immunohistochemistry for Ki67 (P = .01) and was accompanied by an increase in transforming growth factor-alpha in plasma samples (P < .001). Disease-free survival (DFS) was better in patients if epidermal growth factor receptor expression was upregulated in the tumor after the initial cetuximab dose (P = .02) and when fibro-inflammatory changes were present in the surgical specimen (P = .03). Microarray and proteomic profiles were predictive of DFS.Our study showed that a single dose of cetuximab has a significant impact on the expression of genes involved in tumor proliferation and inflammation. We identified potential biomarkers that might predict response to cetuximab-based CRT.

    View details for DOI 10.1200/JCO.2008.18.5033

    View details for Web of Science ID 000266782100005

    View details for PubMedID 19332731

  • Prediction of cancer outcome using DNA microarray technology: past, present and future. Expert opinion on medical diagnostics Gevaert, O., De Moor, B. 2009; 3 (2): 157-165

    Abstract

    Background: The use of DNA microarray technology to predict cancer outcome already has a history of almost a decade. Although many breakthroughs have been made, the promise of individualized therapy is still not fulfilled. In addition, new technologies are emerging that also show promise in outcome prediction of cancer patients. Objective: The impact of DNA microarray and other 'omics' technologies on the outcome prediction of cancer patients was investigated. Whether integration of omics data results in better predictions was also examined. Methods: DNA microarray technology was focused on as a starting point because this technology is considered to be the most mature technology from all omics technologies. Next, emerging technologies that may accomplish the same goals but have been less extensively studied are described. Conclusion: Besides DNA microarray technology, other omics technologies have shown promise in predicting the cancer outcome or have potential to replace microarray technology in the near future. Moreover, it is shown that integration of multiple omics data can result in better predictions of cancer outcome; but, owing to the lack of comprehensive studies, validation studies are required to verify which omics has the most information and whether a combination of multiple omics data improves predictive performance.

    View details for DOI 10.1517/17530050802680172

    View details for PubMedID 23485162

  • A kernel-based integration of genome-wide data for clinical decision support. Genome medicine Daemen, A., Gevaert, O., Ojeda, F., Debucquoy, A., Suykens, J. A., Sempoux, C., Machiels, J., Haustermans, K., De Moor, B. 2009; 1 (4): 39-?

    Abstract

    Although microarray technology allows the investigation of the transcriptomic make-up of a tumor in one experiment, the transcriptome does not completely reflect the underlying biology due to alternative splicing, post-translational modifications, as well as the influence of pathological conditions (for example, cancer) on transcription and translation. This increases the importance of fusing more than one source of genome-wide data, such as the genome, transcriptome, proteome, and epigenome. The current increase in the amount of available omics data emphasizes the need for a methodological integration framework.We propose a kernel-based approach for clinical decision support in which many genome-wide data sources are combined. Integration occurs within the patient domain at the level of kernel matrices before building the classifier. As supervised classification algorithm, a weighted least squares support vector machine is used. We apply this framework to two cancer cases, namely, a rectal cancer data set containing microarray and proteomics data and a prostate cancer data set containing microarray and genomics data. For both cases, multiple outcomes are predicted.For the rectal cancer outcomes, the highest leave-one-out (LOO) areas under the receiver operating characteristic curves (AUC) were obtained when combining microarray and proteomics data gathered during therapy and ranged from 0.927 to 0.987. For prostate cancer, all four outcomes had a better LOO AUC when combining microarray and genomics data, ranging from 0.786 for recurrence to 0.987 for metastasis.For both cancer sites the prediction of all outcomes improved when more than one genome-wide data set was considered. This suggests that integrating multiple genome-wide data sources increases the predictive performance of clinical decision support models. This emphasizes the need for comprehensive multi-modal data. We acknowledge that, in a first phase, this will substantially increase costs; however, this is a necessary investment to ultimately obtain cost-efficient models usable in patient tailored therapy.

    View details for DOI 10.1186/gm39

    View details for PubMedID 19356222

  • Building decision trees for diagnosing intracavitary uterine pathology. Facts, views & vision in ObGyn Van den Bosch, T., Daemen, A., Gevaert, O., De Moor, B., Timmerman, D. 2009; 1 (3): 182-188

    Abstract

    To build decision trees to predict intrauterine disease, based on a clinical data set, and using mathematical software.Diagnostic algorithms were built and validated using the data of 402 consecutive patients who underwent grey scale ultrasound, followed by colour Doppler, saline infusion sonography (SIS), office hysteroscopy and endometrial-- sampling. The "final diagnosis" was classified as "abnormal" in case of endometrial polyps, hyperplasia or malignancy or intracavitary myoma. "Pre-test parameters" included patient's age, weight, length, parity, menopausal status, bleeding symptoms and cervical cytology; "post-test parameters" included ultrasound-, color Doppler-, SIS-, hysteroscopy- findings and histology results after endometrial sampling. Decision Tree #1 was built using both "pre-test" and "post-test" parameters; Tree #2 was only based on "post-test" parameters; Tree #3 was designed without using the hysteroscopy variables. The Waikato Environment for Knowledge Analysis (Weka) software was used for the development of decision trees.All trees started with an imaging technique: hysteroscopy or SIS. The diagnostic accuracy was 88.3%, 88.3% and 84.0% for Tree #1, #2 and #3 respectively, the sensitivity and specificity was 95.5% and 82%, 97.7% and 80.0, 93.2 and 76.0%, respectively.The method used in this study enables the comparison between different decision trees containing multiple tests.

    View details for PubMedID 25489463

  • A kernel-based integration of genome-wide data for clinical decision support GENOME MEDICINE Daemen, A., Gevaert, O., Ojeda, F., Debucquoy, A., Suykens, J. A., Sempoux, C., Machiels, J., Haustermans, K., De Moor, B. 2009; 1

    View details for DOI 10.1186/gm39

    View details for Web of Science ID 000208627000039

  • SUPERVISED CLASSIFICATION OF ARRAY CGH DATA WITH HMM-BASED FEATURE SELECTION Pacific Symposium on Biocomputing Daemen, A., Gevaert, O., Leunen, K., Legius, E., Vergote, I., De Moor, B. WORLD SCIENTIFIC PUBL CO PTE LTD. 2009: 468–479

    Abstract

    For different tumour types, extended knowledge about the molecular mechanisms involved in tumorigenesis is lacking. Looking for copy number variations (CNV) by Comparative Genomic Hybridization (CGH) can help however to determine key elements in this tumorigenesis. As genome-wide array CGH gives the opportunity to evaluate CNV at high resolution, this leads to huge amount of data, necessitating adequate mathematical methods to carefully select and interpret these data.Two groups of patients differing in cancer subtype were defined in two publicly available array CGH data sets as well as in our own data set on ovarian cancer. Chromosomal regions characterizing each group of patients were gathered using recurrent hidden Markov Models (HMM). The differential regions were reduced to a subset of features for classification by integrating different univariate feature selection methods. Weighted Least Squares Support Vector Machines (LS-SVM), a supervised classification method which takes unbalancedness of data sets into account, resulted in leave-one-out or 10-fold cross-validation accuracies ranging from 88 to 95.5%.The combination of recurrent HMMs for the detection of copy number alterations with LS-SVM classifiers offers a novel methodological approach for classification based on copy number alterations. Additionally, this approach limits the chromosomal regions that are necessary to classify patients according to cancer subtype.

    View details for Web of Science ID 000263639700045

    View details for PubMedID 19209723

  • Pain experienced during transvaginal ultrasound, saline contrast sonohysterography, hysteroscopy and office sampling: a comparative study ULTRASOUND IN OBSTETRICS & GYNECOLOGY Van den Bosch, T., Verguts, J., Daemen, A., Gevaert, O., Domali, E., Claerhout, F., Vandenbroucke, V., De Moor, B., Deprest, J., Timmerman, D. 2008; 31 (3): 346-351

    Abstract

    To evaluate and compare the pain experienced by women during transvaginal ultrasound, saline contrast sonohysterography (SCSH), diagnostic hysteroscopy and office sampling.This was a descriptive study of 402 consecutive patients presenting at a 'one-stop' Bleeding Clinic between October 2004 and November 2006. Thirty-nine percent of the patients were postmenopausal. The patients underwent the following examinations transvaginally: first ultrasound with color Doppler, second SCSH, third diagnostic hysteroscopy and fourth endometrial biopsy. After completion of the examinations the patients were asked to complete a questionnaire including a visual analog scale (VAS) about their subjective appreciation of all four examinations. Two-hundred and ninety-three (72%) patients returned the questionnaire.The median (range) VAS scores for transvaginal ultrasound, SCSH, diagnostic hysteroscopy and endometrial sampling were 1.0 (0-8.1), 2.2 (0-10), 2.7 (0-10) and 5.1 (0-10), respectively (P < 0.0001). The patients' answers to the other questions about the pain experienced, including comparison with other minor procedures such as venous blood sampling, were all concordant with the VAS scores.Transvaginal ultrasound was the procedure best accepted, followed by SCSH, hysteroscopy and endometrial sampling. These results suggest that patients would prefer SCSH over hysteroscopy as an initial diagnostic approach in the evaluation of abnormal uterine bleeding.

    View details for DOI 10.1002/uog.5263

    View details for Web of Science ID 000254541900019

    View details for PubMedID 18307203

  • Expression profiling to predict the clinical behaviour of ovarian cancer fails independent evaluation BMC CANCER Gevaert, O., De Smet, F., Van Gorp, T., Pochet, N., Engelen, K., Amant, F., De Moor, B., Timmerman, D., Vergote, I. 2008; 8

    Abstract

    In a previously published pilot study we explored the performance of microarrays in predicting clinical behaviour of ovarian tumours. For this purpose we performed microarray analysis on 20 patients and estimated that we could predict advanced stage disease with 100% accuracy and the response to platin-based chemotherapy with 76.92% accuracy using leave-one-out cross validation techniques in combination with Least Squares Support Vector Machines (LS-SVMs).In the current study we evaluate whether tumour characteristics in an independent set of 49 patients can be predicted using the pilot data set with principal component analysis or LS-SVMs.The results of the principal component analysis suggest that the gene expression data from stage I, platin-sensitive advanced stage and platin-resistant advanced stage tumours in the independent data set did not correspond to their respective classes in the pilot study. Additionally, LS-SVM models built using the data from the pilot study - although they only misclassified one of four stage I tumours and correctly classified all 45 advanced stage tumours - were not able to predict resistance to platin-based chemotherapy. Furthermore, models based on the pilot data and on previously published gene sets related to ovarian cancer outcomes, did not perform significantly better than our models.We discuss possible reasons for failure of the model for predicting response to platin-based chemotherapy and conclude that existing results based on gene expression patterns of ovarian tumours need to be thoroughly scrutinized before these results can be accepted to reflect the true performance of microarray technology.

    View details for DOI 10.1186/1471-2407-8-18

    View details for Web of Science ID 000253596800002

    View details for PubMedID 18211668

  • Integration of microarray and textual data improves the prognosis prediction of breast, lung and ovarian cancer patients. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Gevaert, O., Van Vooren, S., De Moor, B. 2008: 279-290

    Abstract

    Microarray data are notoriously noisy such that models predicting clinically relevant outcomes often contain many false positive genes. Integration of other data sources can alleviate this problem and enhance gene selection and model building. Probabilistic models provide a natural solution to integrate information by using the prior over model space. We investigated if the use of text information from PUBMED abstracts in the structure prior of a Bayesian network could improve the prediction of the prognosis in cancer. Our results show that prediction of the outcome with the text prior was significantly better compared to not using a prior, both on a well known microarray data set and on three independent microarray data sets.

    View details for PubMedID 18229693

  • Classification of sporadic and BRCA1 ovarian cancer based on a genome-wide study of copy number variations KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS Daemen, A., Gevaert, O., Leunen, K., Vanspauwen, V., Michils, G., Legius, E., Vergote, I., De Moor, B. 2008; 5178: 165-?
  • Integrating microarray and proteomics data to predict the response on cetuximab in patients with rectal cancer. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Daemen, A., Gevaert, O., De Bie, T., Debucquoy, A., Machiels, J., De Moor, B., Haustermans, K. 2008: 166-177

    Abstract

    To investigate the combination of cetuximab, capecitabine and radiotherapy in the preoperative treatment of patients with rectal cancer, fourty tumour samples were gathered before treatment (T0), after one dose of cetuximab but before radiotherapy with capecitabine (T1) and at moment of surgery (T2). The tumour and plasma samples were subjected at all timepoints to Affymetrix microarray and Luminex proteomics analysis, respectively. At surgery, the Rectal Cancer Regression Grade (RCRG) was registered. We used a kernel-based method with Least Squares Support Vector Machines to predict RCRG based on the integration of microarray and proteomics data on To and T1. We demonstrated that combining multiple data sources improves the predictive power. The best model was based on 5 genes and 10 proteins at T0 and T1 and could predict the RCRG with an accuracy of 91.7%, sensitivity of 96.2% and specificity of 80%.

    View details for PubMedID 18229684

  • A framework for elucidating regulatory networks based on prior information and expression data Workshop on Dialogue on Reverse Engineering Assessment and Methods Gevaert, O., Van Vooren, S., De Moor, B. WILEY-BLACKWELL. 2007: 240–248

    Abstract

    Elucidating regulatory networks is an intensively studied topic in bioinformatics. Integration of different sources of information could facilitate this task. We propose to incorporate these information sources in the structure prior of a Bayesian network. We are currently investigating two complementary sources of information: PubMed abstracts combined with publicly available taxonomies or ontologies, and known protein-DNA interactions. These priors, either separately or combined, have the potential of reducing the complexity of reverse-engineering regulatory networks while creating more robust and reliable models. Moreover this approach can easily be extended with other data sources. In such a way Bayesian networks provide a powerful framework for data integration and regulatory network modeling.

    View details for DOI 10.1196/annals.1407.002

    View details for Web of Science ID 000252037600017

    View details for PubMedID 17925352

  • Integration of clinical and microarray data with kernel methods 29th Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society Daemen, A., Gevaert, O., De Moor, B. IEEE. 2007: 5411–5415

    Abstract

    Currently, the clinical management of cancer is based on empirical data from the literature (clinical studies) or based on the expertise of the clinician. Recently microarray technology emerged and it has the potential to revolutionize the clinical management of cancer and other diseases. A microarray allows to measure the expression levels of thousands of genes simultaneously which may reflect diagnostic or prognostic categories and sensitivity to treatment. The objective of this paper is to investigate whether clinical data, which is the basis of day-to-day clinical decision support, can be efficiently combined with microarray data, which has yet to prove its potential to deliver patient tailored therapy, using Least Squares Support Vector Machines.

    View details for Web of Science ID 000253467004088

    View details for PubMedID 18003232

  • Molecular profiling of platinum resistant ovarian cancer: Use of the model in clinical practice INTERNATIONAL JOURNAL OF CANCER Gevaert, O., Pochet, N., De Smet, F., Van Gorp, T., De Moor, B., Timmerman, D., Amant, F., Vergote, I. 2006; 119 (6): 1511-1511

    View details for DOI 10.1002/ijc.21985

    View details for Web of Science ID 000239877200043

    View details for PubMedID 16619247

  • Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks 14th Conference on Intelligent Systems for Molecular Biology Gevaert, O., De Smet, F., Timmerman, D., Moreau, Y., De Moor, B. OXFORD UNIV PRESS. 2006: E184–E190

    Abstract

    Clinical data, such as patient history, laboratory analysis, ultrasound parameters--which are the basis of day-to-day clinical decision support--are often underused to guide the clinical management of cancer in the presence of microarray data. We propose a strategy based on Bayesian networks to treat clinical and microarray data on an equal footing. The main advantage of this probabilistic model is that it allows to integrate these data sources in several ways and that it allows to investigate and understand the model structure and parameters. Furthermore using the concept of a Markov Blanket we can identify all the variables that shield off the class variable from the influence of the remaining network. Therefore Bayesian networks automatically perform feature selection by identifying the (in)dependency relationships with the class variable.We evaluated three methods for integrating clinical and microarray data: decision integration, partial integration and full integration and used them to classify publicly available data on breast cancer patients into a poor and a good prognosis group. The partial integration method is most promising and has an independent test set area under the ROC curve of 0.845. After choosing an operating point the classification performance is better than frequently used indices.

    View details for DOI 10.1093/bioinformatics/btl230

    View details for Web of Science ID 000250005000023

    View details for PubMedID 16873470

  • Predicting the outcome of pregnancies of unknown location: Bayesian networks with expert prior information compared to logistic regression HUMAN REPRODUCTION Gevaert, O., De Smet, F., Kirk, E., Van Calster, B., Bourne, T., Van Huffel, S., Moreau, Y., Timmerman, D., De Moor, B., Condous, G. 2006; 21 (7): 1824-1831

    Abstract

    As women present at earlier gestations to early pregnancy units (EPUs), the number of women diagnosed with a pregnancy of unknown location (PUL) increases. Some of these women will have an ectopic pregnancy (EP), and it is this group in the PUL population that poses the greatest concern. The aim of this study was to develop Bayesian networks to predict EPs in the PUL population.Data were gathered in a single EPU from all women with a PUL. This data set was divided into a model-building (599 women with 44 EPs) and a validation (257 women with 22 EPs) data set and consisted of the following variables: vaginal bleeding, fluid in the pouch of Douglas, midline echo, lower abdominal pain, age, endometrial thickness, gestation days, the ratio of HCG at 48 and 0 h, progesterone levels (0 and 48 h) and the clinical outcome of the PUL. We developed Bayesian networks with expert information using this data set to predict EPs.The best Bayesian network used the gestational age, HCG ratio and the progesterone level at 48 h and had an area under the receiver operator characteristic curve (AUC) of 0.88 for predicting EPs when tested prospectively.Discrete-valued Bayesian networks are more complex to build than, for example, logistic regression. Nevertheless, we have demonstrated that such models can be used to predict EPs in a PUL population. Prospective interventional multicentre studies are needed to validate the use of such models in clinical practice.

    View details for DOI 10.1093/humrep/del083

    View details for Web of Science ID 000238907400027

    View details for PubMedID 16601010

  • Diagnostic accuracy of varying discriminatory zones for the prediction of ectopic pregnancy in women with a pregnancy of unknown location ULTRASOUND IN OBSTETRICS & GYNECOLOGY Condous, G., Kirk, E., Lu, C., Van Huffel, S., Gevaert, O., De Moor, B., De Smet, F., Timmerman, D., Bourne, T. 2005; 26 (7): 770-775

    Abstract

    Various serum human chorionic gonadotropin (hCG) discriminatory zones are currently used for evaluating the likelihood of an ectopic pregnancy in women classified as having a pregnancy of unknown location (PUL) following a transvaginal ultrasound examination. We evaluated the diagnostic accuracy of discriminatory zones for serum hCG levels of > 1000 IU/L, 1500 IU/L and 2000 IU/L for the detection of ectopic pregnancy in such women.This was a prospective observational study of women who were assessed in a specialized transvaginal scanning unit. All women with a PUL had serum hCG measured at presentation. Expectant management of PULs was adopted. These women were followed up with transvaginal ultrasound, monitoring of serum hormone levels and laparoscopy until a final diagnosis was established: a failing PUL, an intrauterine pregnancy (IUP), an ectopic pregnancy or a persisting PUL. The persisting PULs probably represented ectopic pregnancies which had been missed on ultrasound and these were incorporated into the ectopic pregnancy group. Three different discriminatory zones (1000 IU/L, 1500 IU/L and 2000 IU/L) were evaluated for predicting ectopic pregnancy in this PUL population.A total of 5544 consecutive women presented to the early pregnancy unit between 25 June 2001 and 14 April 2003. Of these, 569 (10.3%) women were classified as having a PUL, 42 of which were lost to follow up. Of the 527 (9.5%) cases with PUL analyzed, there were 300 (56.9%) failing PULs, 181 (34.3%) IUPs and 46 (8.7%) ectopic pregnancies. Overall, 74.6% were symptomatic and 25.4% were asymptomatic (P = 8.825E-07). The sensitivity and specificity of an hCG level of > 1000 IU/L to detect ectopic pregnancy were 21.7% (10/46) and 87.3% (420/481), respectively; for an hCG level of > 1500 IU/L these values were 15.2% (7/46) and 93.4% (449/481), respectively, and for an hCG level of > 2000 IU/L they were 10.9% (5/46) and 95.2% (458/481), respectively.Varying the discriminatory zone does not significantly improve the detection of ectopic pregnancy in a PUL population. A single measurement of serum hCG is not only potentially falsely reassuring but also unhelpful in excluding the presence of an ectopic pregnancy.

    View details for DOI 10.1002/uog.2636

    View details for Web of Science ID 000234027800015

    View details for PubMedID 16308901