Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases.
2018; 9 (1): 4735
In silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesize that matrices created using only healthy samples from a single microarray platform would introduce biological and technical biases in deconvolution. We show presence of such biases in two existing matrices, IRIS and LM22, irrespective of deconvolution method. Here, we present immunoStates, a basis matrix built using 6160 samples with different disease states across 42 microarray platforms. We find that immunoStates significantly reduces biological and technical biases. Importantly, we find that different methods have virtually no or minimal effect once the basis matrix is chosen. We further show that cellular proportion estimates using immunoStates are consistently more correlated with measured proportions than IRIS and LM22, across all methods. Our results demonstrate the need and importance of incorporating biological and technical heterogeneity in a basis matrix for achieving consistently high accuracy.
View details for DOI 10.1038/s41467-018-07242-6
View details for PubMedID 30413720
Machine learning to predict lung nodule biopsy method using CT image features: A pilot study.
Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society
2018; 71: 1–8
Computed tomography (CT)-based screening on lung cancer mortality is poised to make lung nodule management a growing public health problem. Biopsy and pathologic analysis of suspicious nodules is necessary to ensure accurate diagnosis and appropriate intervention. Biopsy techniques vary as do the specialists that perform them and the ways lung nodule patients are referred and triaged. The largest dichotomy is between minimally invasive biopsy (MIB) and surgical biopsy (SB). Cases of unsuccessful MIB preceding a SB can result in considerable delay in definitive care with potentially an adverse impact on prognosis besides potentially avoidable healthcare expenditures. An automated method that predicts the optimal biopsy method for a given lung nodule could save time and healthcare costs by facilitating referral and triage patterns. To our knowledge, no such method has been published. Here, we used CT image features and radiologist-annotated semantic features to predict successful MIB in a way that has not been described before. Using data from the Lung Image Database Consortium image collection (LIDC-IDRI), we trained a logistic regression model to determine whether a MIB or SB procedure was used to diagnose lung cancer in a patient presenting with lung nodules. We found that in successful MIB cases, the nodules were significantly larger and more spiculated. Our model illustrates that using robust machine learning tools on easily accessible semantic and image data can predict whether a patient's nodule is best biopsied by MIB or SB. Pending further validation and optimization, clinicians could use our publicly accessible model to aid clinical decision-making.
View details for DOI 10.1016/j.compmedimag.2018.10.006
View details for PubMedID 30448741
KLRD1-expressing natural killer cells predict influenza susceptibility
2018; 10: 45
Influenza infects tens of millions of people every year in the USA. Other than notable risk groups, such as children and the elderly, it is difficult to predict what subpopulations are at higher risk of infection. Viral challenge studies, where healthy human volunteers are inoculated with live influenza virus, provide a unique opportunity to study infection susceptibility. Biomarkers predicting influenza susceptibility would be useful for identifying risk groups and designing vaccines.We applied cell mixture deconvolution to estimate immune cell proportions from whole blood transcriptome data in four independent influenza challenge studies. We compared immune cell proportions in the blood between symptomatic shedders and asymptomatic nonshedders across three discovery cohorts prior to influenza inoculation and tested results in a held-out validation challenge cohort.Natural killer (NK) cells were significantly lower in symptomatic shedders at baseline in both discovery and validation cohorts. Hematopoietic stem and progenitor cells (HSPCs) were higher in symptomatic shedders at baseline in discovery cohorts. Although the HSPCs were higher in symptomatic shedders in the validation cohort, the increase was statistically nonsignificant. We observed that a gene associated with NK cells, KLRD1, which encodes CD94, was expressed at lower levels in symptomatic shedders at baseline in discovery and validation cohorts. KLRD1 expression in the blood at baseline negatively correlated with influenza infection symptom severity. KLRD1 expression 8 h post-infection in the nasal epithelium from a rhinovirus challenge study also negatively correlated with symptom severity.We identified KLRD1-expressing NK cells as a potential biomarker for influenza susceptibility. Expression of KLRD1 was inversely correlated with symptom severity. Our results support a model where an early response by KLRD1-expressing NK cells may control influenza infection.
View details for DOI 10.1186/s13073-018-0554-1
View details for Web of Science ID 000435421500001
View details for PubMedID 29898768
View details for PubMedCentralID PMC6001128
EMPOWERING MULTI-COHORT GENE EXPRESSION ANALYSIS TO INCREASE REPRODUCIBILITY.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2016; 22: 144-153
A major contributor to the scientific reproducibility crisis has been that the results from homogeneous, single-center studies do not generalize to heterogeneous, real world populations. Multi-cohort gene expression analysis has helped to increase reproducibility by aggregating data from diverse populations into a single analysis. To make the multi-cohort analysis process more feasible, we have assembled an analysis pipeline which implements rigorously studied meta-analysis best practices. We have compiled and made publicly available the results of our own multi-cohort gene expression analysis of 103 diseases, spanning 615 studies and 36,915 samples, through a novel and interactive web application. As a result, we have made both the process of and the results from multi-cohort gene expression analysis more approachable for non-technical users.
View details for PubMedID 27896970
View details for PubMedCentralID PMC5167529
Integrated, Multi-cohort Analysis Identifies Conserved Transcriptional Signatures across Multiple Respiratory Viruses
2015; 43 (6): 1199-1211
Respiratory viral infections are a significant burden to healthcare worldwide. Many whole genome expression profiles have identified different respiratory viral infection signatures, but these have not translated to clinical practice. Here, we performed two integrated, multi-cohort analyses of publicly available transcriptional data of viral infections. First, we identified a common host signature across different respiratory viral infections that could distinguish (1) individuals with viral infections from healthy controls and from those with bacterial infections, and (2) symptomatic from asymptomatic subjects prior to symptom onset in challenge studies. Second, we identified an influenza-specific host response signature that (1) could distinguish influenza-infected samples from those with bacterial and other respiratory viral infections, (2) was a diagnostic and prognostic marker in influenza-pneumonia patients and influenza challenge studies, and (3) was predictive of response to influenza vaccine. Our results have applications in the diagnosis, prognosis, and identification of drug targets in viral infections.
View details for DOI 10.1016/j.immuni.2015.11.003
View details for Web of Science ID 000366846600022
View details for PubMedID 26682989
View details for PubMedCentralID PMC4684904