co-Division Chief, Integrative Biomedical Imaging Informatics at Stanford (2009 - Present)
M.S., Stanford University, Health Services Research (1996)
PhD, Stanford University, Electrical Engineering (1992)
B.E., The Cooper Union, Electrical Engineering (1985)
Current Research and Scholarly Interests
My research program focuses on computational modeling of cancer biology and cancer outcomes. My laboratory develops stochastic models of the natural history of cancer based on clinical research data. We estimate population-level outcomes under differing screening and treatment interventions. We also analyze genomic and proteomic cancer data in order to identify molecular networks that are perturbed in cancer initiation and progression and relate these perturbations to patient outcomes.
- Principles of Cancer Systems Biology
CBIO 243 (Spr)
Independent Studies (15)
- Bioengineering Problems and Experimental Investigation
BIOE 191 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum)
- Directed Investigation
BIOE 392 (Aut, Win, Spr, Sum)
- Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum)
- Directed Reading in Cancer Biology
CBIO 299 (Aut, Win, Spr, Sum)
- Directed Reading in Radiology
RAD 299 (Aut, Win, Spr, Sum)
- Directed Study
BIOE 391 (Aut, Win, Spr, Sum)
- Early Clinical Experience in Radiology
RAD 280 (Aut, Win, Spr, Sum)
- Graduate Research
CBIO 399 (Aut, Win, Spr, Sum)
- Graduate Research
RAD 399 (Aut, Win, Spr, Sum)
- Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum)
- Medical Scholars Research
RAD 370 (Aut, Win, Spr, Sum)
- Readings in Radiology Research
RAD 101 (Aut, Win, Spr, Sum)
- Teaching in Cancer Biology
CBIO 260 (Spr)
- Undergraduate Research
RAD 199 (Aut, Win, Spr, Sum)
- Bioengineering Problems and Experimental Investigation
- Prior Year Courses
Graduate and Fellowship Programs
Biomedical Informatics (Phd Program)
Prediction of EGFR and KRAS mutation in non-small cell lung cancer using quantitative 18F FDG-PET/CT metrics.
This study investigated the relationship between epidermal growth factor receptor (EGFR) and Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations in non-small-cell lung cancer (NSCLC) and quantitative FDG-PET/CT parameters including tumor heterogeneity. 131 patients with NSCLC underwent staging FDG-PET/CT followed by tumor resection and histopathological analysis that included testing for the EGFR and KRAS gene mutations. Patient and lesion characteristics, including smoking habits and FDG uptake parameters, were correlated to each gene mutation. Never-smoker (P < 0.001) or low pack-year smoking history (p = 0.002) and female gender (p = 0.047) were predictive factors for the presence of the EGFR mutations. Being a current or former smoker was a predictive factor for the KRAS mutations (p = 0.018). The maximum standardized uptake value (SUVmax) of FDG uptake in lung lesions was a predictive factor of the EGFR mutations (p = 0.029), while metabolic tumor volume and total lesion glycolysis were not predictive. Amongst several tumor heterogeneity metrics included in our analysis, inverse coefficient of variation (1/COV) was a predictive factor (p < 0.02) of EGFR mutations status, independent of metabolic tumor diameter. Multivariate analysis showed that being a never-smoker was the most significant factor (p < 0.001) for the EGFR mutations in lung cancer overall. The tumor heterogeneity metric 1/COV and SUVmax were both predictive for the EGFR mutations in NSCLC in a univariate analysis. Overall, smoking status was the most significant factor for the presence of the EGFR and KRAS mutations in lung cancer.
View details for DOI 10.18632/oncotarget.17782
View details for PubMedID 28538213
Risk prediction models for selection of lung cancer screening candidates: A retrospective validation study
2017; 14 (4)
Selection of candidates for lung cancer screening based on individual risk has been proposed as an alternative to criteria based on age and cumulative smoking exposure (pack-years). Nine previously established risk models were assessed for their ability to identify those most likely to develop or die from lung cancer. All models considered age and various aspects of smoking exposure (smoking status, smoking duration, cigarettes per day, pack-years smoked, time since smoking cessation) as risk predictors. In addition, some models considered factors such as gender, race, ethnicity, education, body mass index, chronic obstructive pulmonary disease, emphysema, personal history of cancer, personal history of pneumonia, and family history of lung cancer.Retrospective analyses were performed on 53,452 National Lung Screening Trial (NLST) participants (1,925 lung cancer cases and 884 lung cancer deaths) and 80,672 Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) ever-smoking participants (1,463 lung cancer cases and 915 lung cancer deaths). Six-year lung cancer incidence and mortality risk predictions were assessed for (1) calibration (graphically) by comparing the agreement between the predicted and the observed risks, (2) discrimination (area under the receiver operating characteristic curve [AUC]) between individuals with and without lung cancer (death), and (3) clinical usefulness (net benefit in decision curve analysis) by identifying risk thresholds at which applying risk-based eligibility would improve lung cancer screening efficacy. To further assess performance, risk model sensitivities and specificities in the PLCO were compared to those based on the NLST eligibility criteria. Calibration was satisfactory, but discrimination ranged widely (AUCs from 0.61 to 0.81). The models outperformed the NLST eligibility criteria over a substantial range of risk thresholds in decision curve analysis, with a higher sensitivity for all models and a slightly higher specificity for some models. The PLCOm2012, Bach, and Two-Stage Clonal Expansion incidence models had the best overall performance, with AUCs >0.68 in the NLST and >0.77 in the PLCO. These three models had the highest sensitivity and specificity for predicting 6-y lung cancer incidence in the PLCO chest radiography arm, with sensitivities >79.8% and specificities >62.3%. In contrast, the NLST eligibility criteria yielded a sensitivity of 71.4% and a specificity of 62.2%. Limitations of this study include the lack of identification of optimal risk thresholds, as this requires additional information on the long-term benefits (e.g., life-years gained and mortality reduction) and harms (e.g., overdiagnosis) of risk-based screening strategies using these models. In addition, information on some predictor variables included in the risk prediction models was not available.Selection of individuals for lung cancer screening using individual risk is superior to selection criteria based on age and pack-years alone. The benefits, harms, and feasibility of implementing lung cancer screening policies based on risk prediction models should be assessed and compared with those of current recommendations.
View details for DOI 10.1371/journal.pmed.1002277
View details for Web of Science ID 000400768500008
View details for PubMedID 28376113
Predictive radiogenomics modeling of EGFR mutation status in lung cancer
Molecular analysis of the mutation status for EGFR and KRAS are now routine in the management of non-small cell lung cancer. Radiogenomics, the linking of medical images with the genomic properties of human tumors, provides exciting opportunities for non-invasive diagnostics and prognostics. We investigated whether EGFR and KRAS mutation status can be predicted using imaging data. To accomplish this, we studied 186 cases of NSCLC with preoperative thin-slice CT scans. A thoracic radiologist annotated 89 semantic image features of each patient's tumor. Next, we built a decision tree to predict the presence of EGFR and KRAS mutations. We found a statistically significant model for predicting EGFR but not for KRAS mutations. The test set area under the ROC curve for predicting EGFR mutation status was 0.89. The final decision tree used four variables: emphysema, airway abnormality, the percentage of ground glass component and the type of tumor margin. The presence of either of the first two features predicts a wild type status for EGFR while the presence of any ground glass component indicates EGFR mutations. These results show the potential of quantitative imaging to predict molecular properties in a non-invasive manner, as CT imaging is more readily available than biopsies.
View details for DOI 10.1038/srep41674
View details for Web of Science ID 000393094200001
View details for PubMedID 28139704
View details for PubMedCentralID PMC5282551
The impact of overdiagnosis on the selection of efficient lung cancer screening strategies.
International journal of cancer
The U.S. Preventive Services Task Force (USPSTF) recently updated their national lung screening guidelines and recommended low-dose computed tomography (LDCT) for lung cancer (LC) screening through age 80. However, the risk of overdiagnosis among older populations is a concern. Using four comparative models from the Cancer Intervention and Surveillance Modeling Network, we evaluate the overdiagnosis of the screening program recommended by USPSTF in the U.S. 1950 birth cohort. We estimate the number of LC deaths averted by screening (D) per overdiagnosed case (O), yielding the ratio D/O, to quantify the trade-off between the harms and benefits of LDCT. We analyze 576 hypothetical screening strategies that vary by age, smoking, and screening frequency and evaluate efficient screening strategies that maximize the D/O ratio and other metrics including D and life-years gained (LYG) per overdiagnosed case. The estimated D/O ratio for the USPSTF screening program is 2.85 (model range: 1.5-4.5) in the 1950 birth cohort, implying LDCT can prevent ∼3 LC deaths per overdiagnosed case. This D/O ratio increases by 22% when the program stops screening at an earlier age 75 instead of 80. Efficiency frontier analysis shows that while the most efficient screening strategies that maximize the mortality reduction (D) irrespective of overdiagnosis screen through age 80, screening strategies that stop at age 75 versus 80 produce greater efficiency in increasing life-years gained per overdiagnosed case. Given the risk of overdiagnosis with LC screening, the stopping age of screening merits further consideration when balancing benefits and harms.
View details for DOI 10.1002/ijc.30602
View details for PubMedID 28073150
Visualization and cellular hierarchy inference of single-cell data using SPADE.
2016; 11 (7): 1264-1279
High-throughput single-cell technologies provide an unprecedented view into cellular heterogeneity, yet they pose new challenges in data analysis and interpretation. In this protocol, we describe the use of Spanning-tree Progression Analysis of Density-normalized Events (SPADE), a density-based algorithm for visualizing single-cell data and enabling cellular hierarchy inference among subpopulations of similar cells. It was initially developed for flow and mass cytometry single-cell data. We describe SPADE's implementation and application using an open-source R package that runs on Mac OS X, Linux and Windows systems. A typical SPADE analysis on a 2.27-GHz processor laptop takes ∼5 min. We demonstrate the applicability of SPADE to single-cell RNA-seq data. We compare SPADE with recently developed single-cell visualization approaches based on the t-distribution stochastic neighborhood embedding (t-SNE) algorithm. We contrast the implementation and outputs of these methods for normal and malignant hematopoietic cells analyzed by mass cytometry and provide recommendations for appropriate use. Finally, we provide an integrative strategy that combines the strengths of t-SNE and SPADE to infer cellular hierarchy from high-dimensional single-cell data.
View details for DOI 10.1038/nprot.2016.066
View details for PubMedID 27310265
Collaborative Modeling of the Benefits and Harms Associated With Different US Breast Cancer Screening Strategies
ANNALS OF INTERNAL MEDICINE
2016; 164 (4): 215-?
Controversy persists about optimal mammography screening strategies.To evaluate screening outcomes, taking into account advances in mammography and treatment of breast cancer.Collaboration of 6 simulation models using national data on incidence, digital mammography performance, treatment effects, and other-cause mortality.United States.Average-risk U.S. female population and subgroups with varying risk, breast density, or comorbidity.Eight strategies differing by age at which screening starts (40, 45, or 50 years) and screening interval (annual, biennial, and hybrid [annual for women in their 40s and biennial thereafter]). All strategies assumed 100% adherence and stopped at age 74 years.Benefits (breast cancer-specific mortality reduction, breast cancer deaths averted, life-years, and quality-adjusted life-years); number of mammograms used; harms (false-positive results, benign biopsies, and overdiagnosis); and ratios of harms (or use) and benefits (efficiency) per 1000 screens.Biennial strategies were consistently the most efficient for average-risk women. Biennial screening from age 50 to 74 years avoided a median of 7 breast cancer deaths versus no screening; annual screening from age 40 to 74 years avoided an additional 3 deaths, but yielded 1988 more false-positive results and 11 more overdiagnoses per 1000 women screened. Annual screening from age 50 to 74 years was inefficient (similar benefits, but more harms than other strategies). For groups with a 2- to 4-fold increased risk, annual screening from age 40 years had similar harms and benefits as screening average-risk women biennially from 50 to 74 years. For groups with moderate or severe comorbidity, screening could stop at age 66 to 68 years.Other imaging technologies, polygenic risk, and nonadherence were not considered.Biennial screening for breast cancer is efficient for average-risk populations. Decisions about starting ages and intervals will depend on population characteristics and the decision makers' weight given to the harms and benefits of screening.National Institutes of Health.
View details for DOI 10.7326/M15-1536
View details for Web of Science ID 000370135300012
View details for PubMedID 26756606
Integrating Tumor and Stromal Gene Expression Signatures With Clinical Indices for Survival Stratification of Early-Stage Non-Small Cell Lung Cancer.
Journal of the National Cancer Institute
2015; 107 (10)
Accurate survival stratification in early-stage non-small cell lung cancer (NSCLC) could inform the use of adjuvant therapy. We developed a clinically implementable mortality risk score incorporating distinct tumor microenvironmental gene expression signatures and clinical variables.Gene expression profiles from 1106 nonsquamous NSCLCs were used for generation and internal validation of a nine-gene molecular prognostic index (MPI). A quantitative polymerase chain reaction (qPCR) assay was developed and validated on an independent cohort of formalin-fixed paraffin-embedded (FFPE) tissues (n = 98). A prognostic score using clinical variables was generated using Surveillance, Epidemiology, and End Results data and combined with the MPI. All statistical tests for survival were two-sided.The MPI stratified stage I patients into prognostic categories in three microarray and one FFPE qPCR validation cohorts (HR = 2.99, 95% CI = 1.55 to 5.76, P < .001 in stage IA patients of the largest microarray validation cohort; HR = 3.95, 95% CI = 1.24 to 12.64, P = .01 in stage IA of the qPCR cohort). Prognostic genes were expressed in distinct tumor cell subpopulations, and genes implicated in proliferation and stem cells portended poor outcomes, while genes involved in normal lung differentiation and immune infiltration were associated with superior survival. Integrating the MPI with clinical variables conferred greatest prognostic power (HR = 3.43, 95% CI = 2.18 to 5.39, P < .001 in stage I patients of the largest microarray cohort; HR = 3.99, 95% CI = 1.67 to 9.56, P < .001 in stage I patients of the qPCR cohort). Finally, the MPI was prognostic irrespective of somatic alterations in EGFR, KRAS, TP53, and ALK.The MPI incorporates genes expressed in the tumor and its microenvironment and can be implemented clinically using qPCR assays on FFPE tissues. A composite model integrating the MPI with clinical variables provides the most accurate risk stratification.
View details for DOI 10.1093/jnci/djv211
View details for PubMedID 26286589
- Integrating Tumor and Stromal Gene Expression Signatures With Clinical Indices for Survival Stratification of Early-Stage Non-Small Cell Lung Cancer. Journal of the National Cancer Institute 2015; 107 (10)
- ARF: Connecting senescence and innate immunity for clearance AGING-US 2015; 7 (9): 613-615
The prognostic landscape of genes and infiltrating immune cells across human cancers
2015; 21 (8): 938-945
Molecular profiles of tumors and tumor-associated cells hold great promise as biomarkers of clinical outcomes. However, existing data sets are fragmented and difficult to analyze systematically. Here we present a pan-cancer resource and meta-analysis of expression signatures from ∼18,000 human tumors with overall survival outcomes across 39 malignancies. By using this resource, we identified a forkhead box MI (FOXM1) regulatory network as a major predictor of adverse outcomes, and we found that expression of favorably prognostic genes, including KLRB1 (encoding CD161), largely reflect tumor-associated leukocytes. By applying CIBERSORT, a computational approach for inferring leukocyte representation in bulk tumor transcriptomes, we identified complex associations between 22 distinct leukocyte subsets and cancer survival. For example, tumor-associated neutrophil and plasma cell signatures emerged as significant but opposite predictors of survival for diverse solid tumors, including breast and lung adenocarcinomas. This resource and associated analytical tools (http://precog.stanford.edu) may help delineate prognostic genes and leukocyte subsets within and across cancers, shed light on the impact of tumor heterogeneity on cancer outcomes, and facilitate the discovery of biomarkers and therapeutic targets.
View details for DOI 10.1038/nm.3909
View details for Web of Science ID 000359181000022
View details for PubMedID 26193342
Mutations in early follicular lymphoma progenitors are associated with suppressed antigen presentation.
Proceedings of the National Academy of Sciences of the United States of America
2015; 112 (10): E1116-25
Follicular lymphoma (FL) is incurable with conventional therapies and has a clinical course typified by multiple relapses after therapy. These tumors are genetically characterized by B-cell leukemia/lymphoma 2 (BCL2) translocation and mutation of genes involved in chromatin modification. By analyzing purified tumor cells, we identified additional novel recurrently mutated genes and confirmed mutations of one or more chromatin modifier genes within 96% of FL tumors and two or more in 76% of tumors. We defined the hierarchy of somatic mutations arising during tumor evolution by analyzing the phylogenetic relationship of somatic mutations across the coding genomes of 59 sequentially acquired biopsies from 22 patients. Among all somatically mutated genes, CREBBP mutations were most significantly enriched within the earliest inferable progenitor. These mutations were associated with a signature of decreased antigen presentation characterized by reduced transcript and protein abundance of MHC class II on tumor B cells, in line with the role of CREBBP in promoting class II transactivator (CIITA)-dependent transcriptional activation of these genes. CREBBP mutant B cells stimulated less proliferation of T cells in vitro compared with wild-type B cells from the same tumor. Transcriptional signatures of tumor-infiltrating T cells were indicative of reduced proliferation, and this corresponded to decreased frequencies of tumor-infiltrating CD4 helper T cells and CD8 memory cytotoxic T cells. These observations therefore implicate CREBBP mutation as an early event in FL evolution that contributes to immune evasion via decreased antigen presentation.
View details for DOI 10.1073/pnas.1501199112
View details for PubMedID 25713363
View details for PubMedCentralID PMC4364211
- Mutations in early follicular lymphoma progenitors are associated with suppressed antigen presentation. Proceedings of the National Academy of Sciences of the United States of America 2015; 112 (10): E1116-25
p19ARF is a critical mediator of both cellular senescence and an innate immune response associated with MYC inactivation in mouse model of acute leukemia
2015; 6 (6): 3563-3577
MYC-induced T-ALL exhibit oncogene addiction. Addiction to MYC is a consequence of both cell-autonomous mechanisms, such as proliferative arrest, cellular senescence, and apoptosis, as well as non-cell autonomous mechanisms, such as shutdown of angiogenesis, and recruitment of immune effectors. Here, we show, using transgenic mouse models of MYC-induced T-ALL, that the loss of either p19ARF or p53 abrogates the ability of MYC inactivation to induce sustained tumor regression. Loss of p53 or p19ARF, influenced the ability of MYC inactivation to elicit the shutdown of angiogenesis; however the loss of p19ARF, but not p53, impeded cellular senescence, as measured by SA-beta-galactosidase staining, increased expression of p16INK4A, and specific histone modifications. Moreover, comparative gene expression analysis suggested that a multitude of genes involved in the innate immune response were expressed in p19ARF wild-type, but not null, tumors upon MYC inactivation. Indeed, the loss of p19ARF, but not p53, impeded the in situ recruitment of macrophages to the tumor microenvironment. Finally, p19ARF null-associated gene signature prognosticated relapse-free survival in human patients with ALL. Therefore, p19ARF appears to be important to regulating cellular senescence and innate immune response that may contribute to the therapeutic response of ALL.
View details for Web of Science ID 000352696200012
View details for PubMedID 25784651
Molecular subtyping for clinically defined breast cancer subgroups
BREAST CANCER RESEARCH
Breast cancer is commonly classified into intrinsic molecular subtypes. Standard gene centering is routinely done prior to molecular subtyping, but it can produce inaccurate classifications when the distribution of clinicopathological characteristics in the study cohort differs from that of the training cohort used to derive the classifier.We propose a subgroup-specific gene-centering method to perform molecular subtyping on a study cohort that has a skewed distribution of clinicopathological characteristics relative to the training cohort. On such a study cohort, we center each gene on a specified percentile, where the percentile is determined from a subgroup of the training cohort with clinicopathological characteristics similar to the study cohort. We demonstrate our method using the PAM50 classifier and its associated University of North Carolina (UNC) training cohort. We considered study cohorts with skewed clinicopathological characteristics, including subgroups composed of a single prototypic subtype of the UNC-PAM50 training cohort (n = 139), an external estrogen receptor (ER)-positive cohort (n = 48) and an external triple-negative cohort (n = 77).Subgroup-specific gene centering improved prediction performance with the accuracies between 77% and 100%, compared to accuracies between 17% and 33% from standard gene centering, when applied to the prototypic tumor subsets of the PAM50 training cohort. It reduced classification error rates on the ER-positive (11% versus 28%; P = 0.0389), the ER-negative (5% versus 41%; P < 0.0001) and the triple-negative (11% versus 56%; P = 0.1336) subgroups of the PAM50 training cohort. In addition, it produced higher accuracy for subtyping study cohorts composed of varying proportions of ER-positive versus ER-negative cases. Finally, it increased the percentage of assigned luminal subtypes on the external ER-positive cohort and basal-like subtype on the external triple-negative cohort.Gene centering is often necessary to accurately apply a molecular subtype classifier. Compared with standard gene centering, our proposed subgroup-specific gene centering produced more accurate molecular subtype assignments in a study cohort with skewed clinicopathological characteristics relative to the training cohort.
View details for DOI 10.1186/s13058-015-0520-4
View details for Web of Science ID 000351829500001
View details for PubMedID 25849221
Pancancer analysis of DNA methylation-driven genes using MethylMix.
2015; 16: 17-?
Aberrant DNA methylation is an important mechanism that contributes to oncogenesis. Yet, few algorithms exist that exploit this vast dataset to identify hypo- and hypermethylated genes in cancer. We developed a novel computational algorithm called MethylMix to identify differentially methylated genes that are also predictive of transcription. We apply MethylMix to 12 individual cancer sites, and additionally combine all cancer sites in a pancancer analysis. We discover pancancer hypo- and hypermethylated genes and identify novel methylation-driven subgroups with clinical implications. MethylMix analysis on combined cancer sites reveals 10 pancancer clusters reflecting new similarities across malignantly transformed tissues.
View details for DOI 10.1186/s13059-014-0579-8
View details for PubMedID 25631659
- Effects of Screening and Systemic Adjuvant Therapy on ER-Specific US Breast Cancer Mortality JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE 2014; 106 (11)
- Glioblastoma Multiforme: Exploratory Radiogenomic Analysis by Using Quantitative Image Features RADIOLOGY 2014; 273 (1): 168-174
Oncogenic transformation of diverse gastrointestinal tissues in primary organoid culture
2014; 20 (7): 769-777
The application of primary organoid cultures containing epithelial and mesenchymal elements to cancer modeling holds promise for combining the accurate multilineage differentiation and physiology of in vivo systems with the facile in vitro manipulation of transformed cell lines. Here we used a single air-liquid interface culture method without modification to engineer oncogenic mutations into primary epithelial and mesenchymal organoids from mouse colon, stomach and pancreas. Pancreatic and gastric organoids exhibited dysplasia as a result of expression of Kras carrying the G12D mutation (Kras(G12D)), p53 loss or both and readily generated adenocarcinoma after in vivo transplantation. In contrast, primary colon organoids required combinatorial Apc, p53, Kras(G12D) and Smad4 mutations for progressive transformation to invasive adenocarcinoma-like histology in vitro and tumorigenicity in vivo, recapitulating multi-hit models of colorectal cancer (CRC), as compared to the more promiscuous transformation of small intestinal organoids. Colon organoid culture functionally validated the microRNA miR-483 as a dominant driver oncogene at the IGF2 (insulin-like growth factor-2) 11p15.5 CRC amplicon, inducing dysplasia in vitro and tumorigenicity in vivo. These studies demonstrate the general utility of a highly tractable primary organoid system for cancer modeling and driver oncogene validation in diverse gastrointestinal tissues.
View details for DOI 10.1038/nm.3585
View details for Web of Science ID 000338689500021
CCAST: A Model-Based Gating Strategy to Isolate Homogeneous Subpopulations in a Heterogeneous Population of Single Cells
PLOS COMPUTATIONAL BIOLOGY
2014; 10 (7)
A model-based gating strategy is developed for sorting cells and analyzing populations of single cells. The strategy, named CCAST, for Clustering, Classification and Sorting Tree, identifies a gating strategy for isolating homogeneous subpopulations from a heterogeneous population of single cells using a data-derived decision tree representation that can be applied to cell sorting. Because CCAST does not rely on expert knowledge, it removes human bias and variability when determining the gating strategy. It combines any clustering algorithm with silhouette measures to identify underlying homogeneous subpopulations, then applies recursive partitioning techniques to generate a decision tree that defines the gating strategy. CCAST produces an optimal strategy for cell sorting by automating the selection of gating markers, the corresponding gating thresholds and gating sequence; all of these parameters are typically manually defined. Even though CCAST is optimized for cell sorting, it can be applied for the identification and analysis of homogeneous subpopulations among heterogeneous single cell data. We apply CCAST on single cell data from both breast cancer cell lines and normal human bone marrow. On the SUM159 breast cancer cell line data, CCAST indicates at least five distinct cell states based on two surface markers (CD24 and EPCAM) and provides a gating sorting strategy that produces more homogeneous subpopulations than previously reported. When applied to normal bone marrow data, CCAST reveals an efficient strategy for gating T-cells without prior knowledge of the major T-cell subtypes and the markers that best define them. On the normal bone marrow data, CCAST also reveals two major mature B-cell subtypes, namely CD123+ and CD123- cells, which were not revealed by manual gating but show distinct intracellular signaling responses. More generally, the CCAST framework could be used on other biological and non-biological high dimensional data types that are mixtures of unknown homogeneous subpopulations.
View details for DOI 10.1371/journal.pcbi.1003664
View details for Web of Science ID 000339890900004
View details for PubMedID 25078380
- Comparing Benefits from Many Possible Computed Tomography Lung Cancer Screening Programs: Extrapolating from the National Lung Screening Trial Using Comparative Modeling PLOS ONE 2014; 9 (6)
- Comparative Analysis of 5 Lung Cancer Natural History and Screening Models That Reproduce Outcomes of the NLST and PLCO Trials CANCER 2014; 120 (11): 1713-1724
Benefits and Harms of Computed Tomography Lung Cancer Screening Strategies: A Comparative Modeling Study for the US Preventive Services Task Force
ANNALS OF INTERNAL MEDICINE
2014; 160 (5): 311-?
View details for Web of Science ID 000332793900003
NF-?B protein expression associates with (18)F-FDG PET tumor uptake in non-small cell lung cancer: A radiogenomics validation study to understand tumor metabolism.
2014; 83 (2): 189-196
We previously demonstrated that NF-κB may be associated with (18)F-FDG PET uptake and patient prognosis using radiogenomics in patients with non-small cell lung cancer (NSCLC). To validate these results, we assessed NF-κB protein expression in an extended cohort of NSCLC patients.We examined NF-κBp65 by immunohistochemistry (IHC) using a Tissue Microarray. Staining intensity was assessed by qualitative ordinal scoring and compared to tumor FDG uptake (SUVmax and SUVmean), lactate dehydrogenase A (LDHA) expression (as a positive control) and outcome using ANOVA, Kaplan Meier (KM), and Cox-proportional hazards (CPH) analysis.365 tumors from 355 patients with long-term follow-up were analyzed. The average age for patients was 67±11 years, 46% were male and 67% were ever smokers. Stage I and II patients comprised 83% of the cohort and the majority had adenocarcinoma (73%). From 88 FDG PET scans available, average SUVmax and SUVmean were 8.3±6.6, and 3.7±2.4 respectively. Increasing NF-κBp65 expression, but not LDHA expression, was associated with higher SUVmax and SUVmean (p=0.03 and 0.02 respectively). Both NF-κBp65 and positive FDG uptake were significantly associated with more advanced stage, tumor histology and invasion. Higher NF-κBp65 expression was associated with death by KM analysis (p=0.06) while LDHA was strongly associated with recurrence (p=0.04). Increased levels of combined NF-κBp65 and LDHA expression were synergistic and associated with both recurrence (p=0.04) and death (p=0.03).NF-κB IHC was a modest biomarker of prognosis that associated with tumor glucose metabolism on FDG PET when compared to existing molecular correlates like LDHA, which was synergistic with NF-κB for outcome. These findings recapitulate radiogenomics profiles previously reported by our group and provide a methodology for studying tumor biology using computational approaches.
View details for DOI 10.1016/j.lungcan.2013.11.001
View details for PubMedID 24355259
Ly6d marks the earliest stage of B-cell specification and identifies the branchpoint between B-cell and T-cell development.
Genes & development
2013; 27 (18): 2063-?
View details for PubMedID 24065771
Improvements in observed and relative survival in follicular grade 1-2 lymphoma during 4 decades: the Stanford University experience.
2013; 122 (6): 981-987
Recent studies report an improvement in overall survival (OS) of patients with follicular lymphoma (FL). Previously untreated patients with grade 1-2 FL referred from 1960-2003 and treated at Stanford were identified. Four eras were considered: era 1, pre-anthracycline (1960-1975, n=180); era 2, anthracycline (1976-1986, n=426), era 3, aggressive chemotherapy/purine analogs (1987-1996, n=471) and era 4, rituximab (1997-2003, n=257). Clinical characteristics, patterns of care and survival outcomes were assessed. Observed OS was compared with the expected OS calculated from Berkeley Mortality Database life tables derived from population matched by gender and age at time of diagnosis. The median OS was 13.6 years. Age, gender and stage did not differ across the eras. Although primary treatment varied, event free survival after the first treatment did not differ between eras (p=0.17). Median OS improved from approximately 11 years in eras 1 and 2 to 18.4 years in era 3 and has not yet been reached for era 4 (p<0.001) with no suggestion of a plateau in any era. These improvements in OS exceeded improvements in survival in the general population during the same time period. Several factors, including better supportive care and effective therapies for relapsed disease, are likely responsible for this improvement.
View details for DOI 10.1182/blood-2013-03-491514
View details for PubMedID 23777769
- Identification of ovarian cancer driver genes by using module network integration of multi-omics data INTERFACE FOCUS 2013; 3 (4)
Feasibility evaluation of an online tool to guide decisions for BRCA1/2 mutation carriers
2013; 12 (1): 65-73
Women with BRCA1 or BRCA2 (BRCA1/2) mutations face difficult decisions about managing their high risks of breast and ovarian cancer. We developed an online tool to guide decisions about cancer risk reduction (available at: http://brcatool.stanford.edu ), and recruited patients and clinicians to test its feasibility. We developed questionnaires for women with BRCA1/2 mutations and clinicians involved in their care, incorporating the System Usability Scale (SUS) and the Center for Healthcare Evaluation Provider Satisfaction Questionnaire (CHCE-PSQ). We enrolled BRCA1/2 mutation carriers who were seen by local physicians or participating in a national advocacy organization, and we enrolled clinicians practicing at Stanford University and in the surrounding community. Forty BRCA1/2 mutation carriers and 16 clinicians participated. Both groups found the tool easy to use, with SUS scores of 82.5-85 on a scale of 1-100; we did not observe differences according to patient age or gene mutation. General satisfaction was high, with a mean score of 4.28 (standard deviation (SD) 0.96) for patients, and 4.38 (SD 0.89) for clinicians, on a scale of 1-5. Most patients (77.5 %) were comfortable using the tool at home. Both patients and clinicians agreed that the decision tool could improve patient-doctor encounters (mean scores 4.50 and 4.69, on a 1-5 scale). Patients and health care providers rated the decision tool highly on measures of usability and clinical relevance. These results will guide a larger study of the tool's impact on clinical decisions.
View details for DOI 10.1007/s10689-012-9577-8
View details for Web of Science ID 000314408700008
View details for PubMedID 23086584
Hierarchy in somatic mutations arising during genomic evolution and progression of follicular lymphoma.
2013; 121 (9): 1604-1611
Follicular lymphoma (FL) is currently incurable using conventional chemotherapy or immunotherapy regimes, compelling new strategies. Advances in high-throughput sequencing technologies that can reveal oncogenic pathways have stimulated interest in tailoring therapies toward actionable somatic mutations. However, for mutation-directed therapies to be most effective, the mutations must be uniformly present in evolved tumor cells as well as in the self-renewing tumor-cell precursors. Here, we show striking intratumoral clonal diversity within FL tumors in the representation of mutations in the majority of genes as revealed by whole exome sequencing of subpopulations. This diversity captures a clonal hierarchy, resolved using immunoglobulin somatic mutations and IGH-BCL2 translocations as a frame of reference and by comparing diagnosis and relapse tumor pairs, allowing us to distinguish early versus late genetic eventsduring lymphomagenesis. We provide evidence that IGH-BCL2 translocations and CREBBP mutations are early events, whereas MLL2 and TNFRSF14 mutations probably represent late events during disease evolution. These observations provide insight into which of the genetic lesions represent suitable candidates for targeted therapies.
View details for DOI 10.1182/blood-2012-09-457283
View details for PubMedID 23297126
View details for PubMedCentralID PMC3587323
Identifying master regulators of cancer and their downstream targets by integrating genomic and epigenomic features.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
Vast amounts of molecular data characterizing the genome, epigenome and transcriptome are becoming available for a variety of cancers. The current challenge is to integrate these diverse layers of molecular biology information to create a more comprehensive view of key biological processes underlying cancer. We developed a biocomputational algorithm that integrates copy number, DNA methylation, and gene expression data to study master regulators of cancer and identify their targets. Our algorithm starts by generating a list of candidate driver genes based on the rationale that genes that are driven by multiple genomic events in a subset of samples are unlikely to be randomly deregulated. We then select the master regulators from the candidate driver and identify their targets by inferring the underlying regulatory network of gene expression. We applied our biocomputational algorithm to identify master regulators and their targets in glioblastoma multiforme (GBM) and serous ovarian cancer. Our results suggest that the expression of candidate drivers is more likely to be influenced by copy number variations than DNA methylation. Next, we selected the master regulators and identified their downstream targets using module networks analysis. As a proof-of-concept, we show that the GBM and ovarian cancer module networks recapitulate known processes in these cancers. In addition, we identify master regulators that have not been previously reported and suggest their likely role. In summary, focusing on genes whose expression can be explained by their genomic and epigenomic aberrations is a promising strategy to identify master regulators of cancer.
View details for PubMedID 23424118
TreeVis: A MATLAB-based tool for tree visualization
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE
2013; 109 (1): 74-76
Network-based analyses of high-dimensional biological data often produce results in the form of tree structures. Generating easily interpretable layouts to visualize these tree structures is a non-trivial task. We present a new visualization algorithm to generate two-dimensional layouts for complex tree structures. Implementations in both MATLAB and R are provided.
View details for DOI 10.1016/j.cmpb.2012.08.008
View details for Web of Science ID 000312473300007
View details for PubMedID 23036855
Cross-Species Functional Analysis of Cancer-Associated Fibroblasts Identifies a Critical Role for CLCF1 and IL-6 in Non-Small Cell Lung Cancer In Vivo
2012; 72 (22): 5744-5756
Cancer-associated fibroblasts (CAF) have been reported to support tumor progression by a variety of mechanisms. However, their role in the progression of non-small cell lung cancer (NSCLC) remains poorly defined. In addition, the extent to which specific proteins secreted by CAFs contribute directly to tumor growth is unclear. To study the role of CAFs in NSCLCs, a cross-species functional characterization of mouse and human lung CAFs was conducted. CAFs supported the growth of lung cancer cells in vivo by secretion of soluble factors that directly stimulate the growth of tumor cells. Gene expression analysis comparing normal mouse lung fibroblasts and mouse lung CAFs identified multiple genes that correlate with the CAF phenotype. A gene signature of secreted genes upregulated in CAFs was an independent marker of poor survival in patients with NSCLC. This secreted gene signature was upregulated in normal lung fibroblasts after long-term exposure to tumor cells, showing that lung fibroblasts are "educated" by tumor cells to acquire a CAF-like phenotype. Functional studies identified important roles for CLCF1-CNTFR and interleukin (IL)-6-IL-6R signaling in promoting growth of NSCLCs. This study identifies novel soluble factors contributing to the CAF protumorigenic phenotype in NSCLCs and suggests new avenues for the development of therapeutic strategies.
View details for DOI 10.1158/0008-5472.CAN-12-1097
View details for Web of Science ID 000311141300012
View details for PubMedID 22962265
CytoSPADE: high-performance analysis and visualization of high-dimensional cytometry data
2012; 28 (18): 2400-2401
MOTIVATION: Recent advances in flow cytometry enable simultaneous single-cell measurement of 30+ surface and intracellular proteins. CytoSPADE is a high-performance implementation of an interface for the Spanning-tree Progression Analysis of Density-normalized Events algorithm for tree-based analysis and visualization of this high-dimensional cytometry data. AVAILABILITY: Source code and binaries are freely available at http://cytospade.org and via Bioconductor version 2.10 onwards for Linux, OSX and Windows. CytoSPADE is implemented in R, C++ and Java. CONTACT: email@example.com SUPPLEMENTARY INFORMATION: Additional documentation available at http://cytospade.org.
View details for DOI 10.1093/bioinformatics/bts425
View details for Web of Science ID 000308532300067
View details for PubMedID 22782546
Prognostic PET F-18-FDG Uptake Imaging Features Are Associated with Major Oncogenomic Alterations in Patients with Resected Non-Small Cell Lung Cancer
2012; 72 (15): 3725-3734
Although 2[18F]fluoro-2-deoxy-d-glucose (FDG) uptake during positron emission tomography (PET) predicts post-surgical outcome in patients with non-small cell lung cancer (NSCLC), the biologic basis for this observation is not fully understood. Here, we analyzed 25 tumors from patients with NSCLCs to identify tumor PET-FDG uptake features associated with gene expression signatures and survival. Fourteen quantitative PET imaging features describing FDG uptake were correlated with gene expression for single genes and coexpressed gene clusters (metagenes). For each FDG uptake feature, an associated metagene signature was derived, and a prognostic model was identified in an external cohort and then tested in a validation cohort of patients with NSCLC. Four of eight single genes associated with FDG uptake (LY6E, RNF149, MCM6, and FAP) were also associated with survival. The most prognostic metagene signature was associated with a multivariate FDG uptake feature [maximum standard uptake value (SUV(max)), SUV(variance), and SUV(PCA2)], each highly associated with survival in the external [HR, 5.87; confidence interval (CI), 2.49-13.8] and validation (HR, 6.12; CI, 1.08-34.8) cohorts, respectively. Cell-cycle, proliferation, death, and self-recognition pathways were altered in this radiogenomic profile. Together, our findings suggest that leveraging tumor genomics with an expanded collection of PET-FDG imaging features may enhance our understanding of FDG uptake as an imaging biomarker beyond its association with glycolysis.
View details for DOI 10.1158/0008-5472.CAN-11-3943
View details for Web of Science ID 000307354100004
View details for PubMedID 22710433
View details for PubMedCentralID PMC3596510
Non-Small Cell Lung Cancer: Identifying Prognostic Imaging Biomarkers by Leveraging Public Gene Expression Microarray Data-Methods and Preliminary Results
2012; 264 (2): 387-396
To identify prognostic imaging biomarkers in non-small cell lung cancer (NSCLC) by means of a radiogenomics strategy that integrates gene expression and medical images in patients for whom survival outcomes are not available by leveraging survival data in public gene expression data sets.A radiogenomics strategy for associating image features with clusters of coexpressed genes (metagenes) was defined. First, a radiogenomics correlation map is created for a pairwise association between image features and metagenes. Next, predictive models of metagenes are built in terms of image features by using sparse linear regression. Similarly, predictive models of image features are built in terms of metagenes. Finally, the prognostic significance of the predicted image features are evaluated in a public gene expression data set with survival outcomes. This radiogenomics strategy was applied to a cohort of 26 patients with NSCLC for whom gene expression and 180 image features from computed tomography (CT) and positron emission tomography (PET)/CT were available.There were 243 statistically significant pairwise correlations between image features and metagenes of NSCLC. Metagenes were predicted in terms of image features with an accuracy of 59%-83%. One hundred fourteen of 180 CT image features and the PET standardized uptake value were predicted in terms of metagenes with an accuracy of 65%-86%. When the predicted image features were mapped to a public gene expression data set with survival outcomes, tumor size, edge shape, and sharpness ranked highest for prognostic significance.This radiogenomics strategy for identifying imaging biomarkers may enable a more rapid evaluation of novel imaging modalities, thereby accelerating their translation to personalized medicine.
View details for DOI 10.1148/radiol.12111607
View details for Web of Science ID 000306660000010
View details for PubMedID 22723499
View details for PubMedCentralID PMC3401348
A Simulation Model to Predict the Impact of Prophylactic Surgery and Screening on the Life Expectancy of BRCA1 and BRCA2 Mutation Carriers
CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION
2012; 21 (7): 1066-1077
Women with inherited mutations in the BRCA1 or BRCA2 (BRCA1/2) genes are recommended to undergo a number of intensive cancer risk-reducing strategies, including prophylactic mastectomy, prophylactic oophorectomy, and screening. We estimate the impact of different risk-reducing options at various ages on life expectancy.We apply our previously developed Monte Carlo simulation model of screening and prophylactic surgery in BRCA1/2 mutation carriers. Here, we present the mathematical formulation to compute age-specific breast cancer incidence in the absence of prophylactic oophorectomy, which is an input to the simulation model, and provide sensitivity analysis on related model parameters.The greatest gains in life expectancy result from conducting prophylactic mastectomy and prophylactic oophorectomy immediately after BRCA1/2 mutation testing; these gains vary with age at testing, from 6.8 to 10.3 years for BRCA1 and 3.4 to 4.4 years for BRCA2 mutation carriers. Life expectancy gains from delaying prophylactic surgery by 5 to 10 years range from 1 to 9.9 years for BRCA1 and 0.5 to 4.2 years for BRCA2 mutation carriers. Adding annual breast screening provides gains of 2.0 to 9.9 years for BRCA1 and 1.5 to 4.3 years for BRCA2. Results were most sensitive to variations in our assumptions about the magnitude and duration of breast cancer risk reduction due to prophylactic oophorectomy.Life expectancy gains depend on the type of BRCA mutation and age at interventions. Sensitivity analysis identifies the degree of breast cancer risk reduction due to prophylactic oophorectomy as a key determinant of life expectancy gain.Further study of the impact of prophylactic oophorectomy on breast cancer risk in BRCA1/2 mutation carriers is warranted.
View details for DOI 10.1158/1055-9965.EPI-12-0149
View details for Web of Science ID 000306210100009
View details for PubMedID 22556274
Quantitative Proteomic Profiling Identifies Protein Correlates to EGFR Kinase Inhibition
MOLECULAR CANCER THERAPEUTICS
2012; 11 (5): 1071-1081
Clinical oncology is hampered by lack of tools to accurately assess a patient's response to pathway-targeted therapies. Serum and tumor cell surface proteins whose abundance, or change in abundance in response to therapy, differentiates patients responding to a therapy from patients not responding to a therapy could be usefully incorporated into tools for monitoring response. Here, we posit and then verify that proteomic discovery in in vitro tissue culture models can identify proteins with concordant in vivo behavior and further, can be a valuable approach for identifying tumor-derived serum proteins. In this study, we use stable isotope labeling of amino acids in culture (SILAC) with proteomic technologies to quantitatively analyze the gefitinib-related protein changes in a model system for sensitivity to EGF receptor (EGFR)-targeted tyrosine kinase inhibitors. We identified 3,707 intracellular proteins, 1,276 cell surface proteins, and 879 shed proteins. More than 75% of the proteins identified had quantitative information, and a subset consisting of 400 proteins showed a statistically significant change in abundance following gefitinib treatment. We validated the change in expression profile in vitro and screened our panel of response markers in an in vivo isogenic resistant model and showed that these were markers of gefitinib response and not simply markers of phospho-EGFR downregulation. In doing so, we also were able to identify which proteins might be useful as markers for monitoring response and which proteins might be useful as markers for a priori prediction of response.
View details for DOI 10.1158/1535-7163.MCT-11-0852
View details for Web of Science ID 000307984800003
View details for PubMedID 22411897
Online Tool to Guide Decisions for BRCA1/2 Mutation Carriers
JOURNAL OF CLINICAL ONCOLOGY
2012; 30 (5): 497-506
Women with BRCA1 or BRCA2 (BRCA1/2) mutations must choose between prophylactic surgeries and screening to manage their high risks of breast and ovarian cancer, comparing options in terms of cancer incidence, survival, and quality of life. A clinical decision tool could guide these complex choices.We built a Monte Carlo model for BRCA1/2 mutation carriers, simulating breast screening with annual mammography plus magnetic resonance imaging (MRI) from ages 25 to 69 years and prophylactic mastectomy (PM) and/or prophylactic oophorectomy (PO) at various ages. Modeled outcomes were cancer incidence, tumor features that shape treatment recommendations, overall survival, and cause-specific mortality. We adapted the model into an online tool to support shared decision making.We compared strategies on cancer incidence and survival to age 70 years; for example, PO plus PM at age 25 years optimizes both outcomes (incidence, 4% to 11%; survival, 80% to 83%), whereas PO at age 40 years plus MRI screening offers less effective prevention, yet similar survival (incidence, 36% to 57%; survival, 74% to 80%). To characterize patients' treatment and survivorship experiences, we reported the tumor features and treatments associated with risk-reducing interventions; for example, in most BRCA2 mutation carriers (81%), MRI screening diagnoses stage I, hormone receptor-positive breast cancers, which may not require chemotherapy.Cancer risk-reducing options for BRCA1/2 mutation carriers vary in their impact on cancer incidence, recommended treatments, quality of life, and survival. To guide decisions informed by multiple health outcomes, we provide an online tool for joint use by patients with their physicians (http://brcatool.stanford.edu).
View details for DOI 10.1200/JCO.2011.38.6060
View details for Web of Science ID 000302622900014
View details for PubMedID 22231042
View details for PubMedCentralID PMC3295552
Comparing the benefits of screening for breast cancer and lung cancer using a novel natural history model
CANCER CAUSES & CONTROL
2012; 23 (1): 175-185
To estimate the impact of early detection of cancer, knowledge of how quickly primary tumors grow and at what size they shed lethal metastases is critical. We developed a natural history model of cancer to estimate the probability of disease-specific cure as a function of tumor size, the tumor volume doubling time (TVDT), and disease-specific mortality reduction achievable by screening. The model was applied to non-small-cell lung carcinoma (NSCLC) and invasive ductal carcinoma (IDC), separately. Model parameter estimates were based on Surveillance Epidemiology and End Results (SEER) cancer registry datasets and validated on screening trials. Compared to IDC, NSCLC is estimated to have a lower probability of disease-specific cure at the same detected tumor size, shed lethal metastases at smaller sizes (median: 19 mm for IDC versus 8 mm for NSCLC), have a TVDT that is almost half as long (median: 252 days for IDC versus 134 days for NSCLC). Consequently, NSCLC is associated with a lower mortality reduction from screening at the same screen detection threshold and screening interval. In summary, using a similar natural history model of cancer, we quantify the disease-specific curability attributable to screening for breast cancer, and separately lung cancer, in terms of the TVDT and onset of lethal metastases.
View details for DOI 10.1007/s10552-011-9866-9
View details for Web of Science ID 000297757400017
View details for PubMedID 22116537
Reconstructing Directed Signed Gene Regulatory Network From Microarray Data
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING
2011; 58 (12): 3518-3521
Great efforts have been made to develop both algorithms that reconstruct gene regulatory networks and systems that simulate gene networks and expression data, for the purpose of benchmarking network reconstruction algorithms. An interesting observation is that although many simulation systems chose to use Hill kinetics to generate data, none of the reconstruction algorithms were developed based on the Hill kinetics. One possible explanation is that, in Hill kinetics, activation and inhibition interactions take different mathematical forms, which brings additional combinatorial complexity into the reconstruction problem. We propose a new model that qualitatively behaves similar to the Hill kinetics, but has the same mathematical form for both activation and inhibition. We developed an algorithm to reconstruct gene networks based on this new model. Simulation results suggested a novel biological hypothesis that in gene knockout experiments, repressing protein synthesis to a certain extent may lead to better expression data and higher network reconstruction accuracy.
View details for DOI 10.1109/TBME.2011.2163188
View details for Web of Science ID 000297341500021
View details for PubMedID 21803675
Lymphomas that recur after MYC suppression continue to exhibit oncogene addiction
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2011; 108 (42): 17432-17437
The suppression of oncogenic levels of MYC is sufficient to induce sustained tumor regression associated with proliferative arrest, differentiation, cellular senescence, and/or apoptosis, a phenomenon known as oncogene addiction. However, after prolonged inactivation of MYC in a conditional transgenic mouse model of Eμ-tTA/tetO-MYC T-cell acute lymphoblastic leukemia, some of the tumors recur, recapitulating what is frequently observed in human tumors in response to targeted therapies. Here we report that these recurring lymphomas express either transgenic or endogenous Myc, albeit in many cases at levels below those in the original tumor, suggesting that tumors continue to be addicted to MYC. Many of the recurring lymphomas (76%) harbored mutations in the tetracycline transactivator, resulting in expression of the MYC transgene even in the presence of doxycycline. Some of the remaining recurring tumors expressed high levels of endogenous Myc, which was associated with a genomic rearrangement of the endogenous Myc locus or activation of Notch1. By gene expression profiling, we confirmed that the primary and recurring tumors have highly similar transcriptomes. Importantly, shRNA-mediated suppression of the high levels of MYC in recurring tumors elicited both suppression of proliferation and increased apoptosis, confirming that these tumors remain oncogene addicted. These results suggest that tumors induced by MYC remain addicted to overexpression of this oncogene.
View details for DOI 10.1073/pnas.1107303108
View details for Web of Science ID 000295975300044
View details for PubMedID 21969595
View details for PubMedCentralID PMC3198348
Modeling the impact of population screening on breast cancer mortality in the United States
2011; 20: S75-S81
Optimal US screening strategies remain controversial. We use six simulation models to evaluate screening outcomes under varying strategies.The models incorporate common data on incidence, mammography characteristics, and treatment effects. We evaluate varying initiation and cessation ages applied annually or biennially and calculate mammograms, mortality reduction (vs. no screening), false-positives, unnecessary biopsies and over-diagnosis.The lifetime risk of breast cancer death starting at age 40 is 3% and is reduced by screening. Screening biennially maintains 81% (range 67% to 99%) of annual screening benefits with fewer false-positives. Biennial screening from 50-74 reduces the probability of breast cancer death from 3% to 2.3%. Screening annually from 40 to 84 only lowers mortality an additional one-half of one percent to 1.8% but requires substantially more mammograms and yields more false-positives and over-diagnosed cases.Decisions about screening strategy depend on preferences for benefits vs. potential harms and resource considerations.
View details for Web of Science ID 000311077400013
View details for PubMedID 22015298
Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
2011; 29 (10): 886-U181
The ability to analyze multiple single-cell parameters is critical for understanding cellular heterogeneity. Despite recent advances in measurement technology, methods for analyzing high-dimensional single-cell data are often subjective, labor intensive and require prior knowledge of the biological system. To objectively uncover cellular heterogeneity from single-cell measurements, we present a versatile computational approach, spanning-tree progression analysis of density-normalized events (SPADE). We applied SPADE to flow cytometry data of mouse bone marrow and to mass cytometry data of human bone marrow. In both cases, SPADE organized cells in a hierarchy of related phenotypes that partially recapitulated well-described patterns of hematopoiesis. We demonstrate that SPADE is robust to measurement noise and to the choice of cellular markers. SPADE facilitates the analysis of cellular heterogeneity, the identification of cell types and comparison of functional markers in response to perturbations.
View details for DOI 10.1038/nbt.1991
View details for Web of Science ID 000296273000015
View details for PubMedID 21964415
Prediction of survival in diffuse large B-cell lymphoma based on the expression of 2 genes reflecting tumor and microenvironment
2011; 118 (5): 1350-1358
Several gene-expression signatures predict survival in diffuse large B-cell lymphoma (DLBCL), but the lack of practical methods for genome-scale analysis has limited translation to clinical practice. We built and validated a simple model using one gene expressed by tumor cells and another expressed by host immune cells, assessing added prognostic value to the clinical International Prognostic Index (IPI). LIM domain only 2 (LMO2) was validated as an independent predictor of survival and the "germinal center B cell-like" subtype. Expression of tumor necrosis factor receptor superfamily member 9 (TNFRSF9) from the DLBCL microenvironment was the best gene in bivariate combination with LMO2. Study of TNFRSF9 tissue expression in 95 patients with DLBCL showed expression limited to infiltrating T cells. A model integrating these 2 genes was independent of "cell-of-origin" classification, "stromal signatures," IPI, and added to the predictive power of the IPI. A composite score integrating these genes with IPI performed well in 3 independent cohorts of 545 DLBCL patients, as well as in a simple assay of routine formalin-fixed specimens from a new validation cohort of 147 patients with DLBCL. We conclude that the measurement of a single gene expressed by tumor cells (LMO2) and a single gene expressed by the immune microenvironment (TNFRSF9) powerfully predicts overall survival in patients with DLBCL.
View details for DOI 10.1182/blood-2011-03-345272
View details for Web of Science ID 000293510000028
View details for PubMedID 21670469
View details for PubMedCentralID PMC3152499
Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum
2011; 332 (6030): 687-696
Flow cytometry is an essential tool for dissecting the functional complexity of hematopoiesis. We used single-cell "mass cytometry" to examine healthy human bone marrow, measuring 34 parameters simultaneously in single cells (binding of 31 antibodies, viability, DNA content, and relative cell size). The signaling behavior of cell subsets spanning a defined hematopoietic hierarchy was monitored with 18 simultaneous markers of functional signaling states perturbed by a set of ex vivo stimuli and inhibitors. The data set allowed for an algorithmically driven assembly of related cell types defined by surface antigen expression, providing a superimposable map of cell signaling responses in combination with drug inhibition. Visualized in this manner, the analysis revealed previously unappreciated instances of both precise signaling responses that were bounded within conventionally defined cell subsets and more continuous phosphorylation responses that crossed cell population boundaries in unexpected manners yet tracked closely with cellular phenotype. Collectively, such single-cell analyses provide system-wide views of immune signaling in healthy human hematopoiesis, against which drug action and disease can be compared for mechanistic studies and pharmacologic intervention.
View details for DOI 10.1126/science.1198704
View details for Web of Science ID 000290265800035
View details for PubMedID 21551058
Discovering Biological Progression Underlying Microarray Samples
PLOS COMPUTATIONAL BIOLOGY
2011; 7 (4)
In biological systems that undergo processes such as differentiation, a clear concept of progression exists. We present a novel computational approach, called Sample Progression Discovery (SPD), to discover patterns of biological progression underlying microarray gene expression data. SPD assumes that individual samples of a microarray dataset are related by an unknown biological process (i.e., differentiation, development, cell cycle, disease progression), and that each sample represents one unknown point along the progression of that process. SPD aims to organize the samples in a manner that reveals the underlying progression and to simultaneously identify subsets of genes that are responsible for that progression. We demonstrate the performance of SPD on a variety of microarray datasets that were generated by sampling a biological process at different points along its progression, without providing SPD any information of the underlying process. When applied to a cell cycle time series microarray dataset, SPD was not provided any prior knowledge of samples' time order or of which genes are cell-cycle regulated, yet SPD recovered the correct time order and identified many genes that have been associated with the cell cycle. When applied to B-cell differentiation data, SPD recovered the correct order of stages of normal B-cell differentiation and the linkage between preB-ALL tumor cells with their cell origin preB. When applied to mouse embryonic stem cell differentiation data, SPD uncovered a landscape of ESC differentiation into various lineages and genes that represent both generic and lineage specific processes. When applied to a prostate cancer microarray dataset, SPD identified gene modules that reflect a progression consistent with disease stages. SPD may be best viewed as a novel tool for synthesizing biological hypotheses because it provides a likely biological progression underlying a microarray dataset and, perhaps more importantly, the candidate genes that regulate that progression.
View details for DOI 10.1371/journal.pcbi.1001123
View details for Web of Science ID 000289973600007
View details for PubMedID 21533210
View details for PubMedCentralID PMC3077357
- Bayesian gene set analysis for identifying significant biological pathways JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS 2011; 60: 541-557
Association of a Leukemic Stem Cell Gene Expression Signature With Clinical Outcomes in Acute Myeloid Leukemia
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION
2010; 304 (24): 2706-2715
In many cancers, specific subpopulations of cells appear to be uniquely capable of initiating and maintaining tumors. The strongest support for this cancer stem cell model comes from transplantation assays in immunodeficient mice, which indicate that human acute myeloid leukemia (AML) is driven by self-renewing leukemic stem cells (LSCs). This model has significant implications for the development of novel therapies, but its clinical relevance has yet to be determined.To identify an LSC gene expression signature and test its association with clinical outcomes in AML.Retrospective study of global gene expression (microarray) profiles of LSC-enriched subpopulations from primary AML and normal patient samples, which were obtained at a US medical center between April 2005 and July 2007, and validation data sets of global transcriptional profiles of AML tumors from 4 independent cohorts (n = 1047).Identification of genes discriminating LSC-enriched populations from other subpopulations in AML tumors; and association of LSC-specific genes with overall, event-free, and relapse-free survival and with therapeutic response.Expression levels of 52 genes distinguished LSC-enriched populations from other subpopulations in cell-sorted AML samples. An LSC score summarizing expression of these genes in bulk primary AML tumor samples was associated with clinical outcomes in the 4 independent patient cohorts. High LSC scores were associated with worse overall, event-free, and relapse-free survival among patients with either normal karyotypes or chromosomal abnormalities. For the largest cohort of patients with normal karyotypes (n = 163), the LSC score was significantly associated with overall survival as a continuous variable (hazard ratio [HR], 1.15; 95% confidence interval [CI], 1.08-1.22; log-likelihood P <.001). The absolute risk of death by 3 years was 57% (95% CI, 43%-67%) for the low LSC score group compared with 78% (95% CI, 66%-86%) for the high LSC score group (HR, 1.9 [95% CI, 1.3-2.7]; log-rank P = .002). In another cohort with available data on event-free survival for 70 patients with normal karyotypes, the risk of an event by 3 years was 48% (95% CI, 27%-63%) in the low LSC score group vs 81% (95% CI, 60%-91%) in the high LSC score group (HR, 2.4 [95% CI, 1.3-4.5]; log-rank P = .006). In multivariate Cox regression including age, mutations in FLT3 and NPM1, and cytogenetic abnormalities, the HRs for LSC score in the 3 cohorts with data on all variables were 1.07 (95% CI, 1.01-1.13; P = .02), 1.10 (95% CI, 1.03-1.17; P = .005), and 1.17 (95% CI, 1.05-1.30; P = .005).High expression of an LSC gene signature is independently associated with adverse outcomes in patients with AML.
View details for Web of Science ID 000285518000015
View details for PubMedID 21177505
A Simulation Model Investigating the Impact of Tumor Volume Doubling Time and Mammographic Tumor Detectability on Screening Outcomes in Women Aged 40-49 Years
JOURNAL OF THE NATIONAL CANCER INSTITUTE
2010; 102 (16): 1263-1271
Compared with women aged 50-69 years, the lower sensitivity of mammographic screening in women aged 40-49 years is largely attributed to the lower mammographic tumor detectability and faster tumor growth in the younger women.We used a Monte Carlo simulation model of breast cancer screening by age to estimate the median tumor size detectable on a mammogram and the mean tumor volume doubling time. The estimates were calculated by calibrating the predicted breast cancer incidence rates to the actual rates from the Surveillance, Epidemiology, and End Results (SEER) database and the predicted distributions of screen-detected tumor sizes to the actual distributions obtained from the Breast Cancer Surveillance Consortium (BCSC). The calibrated parameters were used to estimate the relative impact of lower mammographic tumor detectability vs faster tumor volume doubling time on the poorer screening outcomes in younger women compared with older women. Mammography screening outcomes included sensitivity, mean tumor size at detection, lifetime gained, and breast cancer mortality. In addition, the relationship between screening sensitivity and breast cancer mortality was investigated as a function of tumor volume doubling time, mammographic tumor detectability, and screening interval.Lowered mammographic tumor detectability accounted for 79% and faster tumor volume doubling time accounted for 21% of the poorer sensitivity of mammography screening in younger women compared with older women. The relative contributions were similar when the impact of screening was evaluated in terms of mean tumor size at detection, lifetime gained, and breast cancer mortality. Screening sensitivity and breast cancer mortality reduction attributable to screening were almost linearly related when comparing annual or biennial screening with no screening. However, when comparing annual with biennial screening, the greatest reduction in breast cancer mortality attributable to screening did not correspond to the greatest gain in screening sensitivity and was more strongly affected by the mammographic tumor detectability than tumor volume doubling time.The age-specific differences in mammographic tumor detection contribute more than age-specific differences in tumor growth rates to the lowered performance of mammography screening in younger women.
View details for DOI 10.1093/jnci/djq271
View details for Web of Science ID 000281182500010
View details for PubMedID 20664027
Incidental Extracardiac Findings at Coronary CT: Clinical and Economic Impact
AMERICAN JOURNAL OF ROENTGENOLOGY
2010; 194 (6): 1531-1538
The purpose of this study was to evaluate the prevalence of incidental extracardiac findings on coronary CT, to determine the associated downstream resource utilization, and to estimate additional costs per patient related to the associated diagnostic workup.This retrospective study examined incidental extracardiac findings in 151 consecutive adults (69.5% men and 30.5% women; mean age, 54 years) undergoing coronary CT during a 7-year period. Incidental findings were recorded, and medical records were reviewed for downstream diagnostic examinations for a follow-up period of 1 year (minimum) to 7 years (maximum). Costs of further workup were estimated using 2009 Medicare average reimbursement figures.There were 102 incidental extracardiac findings in 43% (65/151) of patients. Fifty-two percent (53/102) of findings were potentially clinically significant, and 81% (43/53) of these findings were newly discovered. The radiology reports made specific follow-up recommendations for 36% (19/53) of new significant findings. Only 4% (6/151) of patients actually underwent follow-up imaging or intervention for incidental findings. One patient was found to have a malignancy that was subsequently treated. The average direct costs of additional diagnostic workup were $17.42 per patient screened (95% CI, $2.84-$32.00) and $438.39 per patient with imaging follow-up (95% CI, $301.47-$575.31).Coronary CT frequently reveals potentially significant incidental extracardiac abnormalities, yet radiologists recommend further evaluation in only one-third of cases. An even smaller fraction of cases receive further workup. The failure to follow-up abnormal incidental findings may result in missed opportunities to detect early disease, but also limits the short-term attributable costs.
View details for DOI 10.2214/AJR.09.3587
View details for Web of Science ID 000277948400016
View details for PubMedID 20489093
View details for PubMedCentralID PMC4827619
MiDReG: A method of mining developmentally regulated genes using Boolean implications
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2010; 107 (13): 5732-5737
We present a method termed mining developmentally regulated genes (MiDReG) to predict genes whose expression is either activated or repressed as precursor cells differentiate. MiDReG does not require gene expression data from intermediate stages of development. MiDReG is based on the gene expression patterns between the initial and terminal stages of the differentiation pathway, coupled with "if-then" rules (Boolean implications) mined from large-scale microarray databases. MiDReG uses two gene expression-based seed conditions that mark the initial and the terminal stages of a given differentiation pathway and combines the statistically inferred Boolean implications from these seed conditions to identify the relevant genes. The method was validated by applying it to B-cell development. The algorithm predicted 62 genes that are expressed after the KIT+ progenitor cell stage and remain expressed through CD19+ and AICDA+ germinal center B cells. qRT-PCR of 14 of these genes on sorted B-cell progenitors confirmed that the expression of 10 genes is indeed stably established during B-cell differentiation. Review of the published literature of knockout mice revealed that of the predicted genes, 63.4% have defects in B-cell differentiation and function and 22% have a role in the B cell according to other experiments, and the remaining 14.6% are not characterized. Therefore, our method identified novel gene candidates for future examination of their role in B-cell development. These data demonstrate the power of MiDReG in predicting functionally important intermediate genes in a given developmental pathway that is defined by a mutually exclusive gene expression pattern.
View details for DOI 10.1073/pnas.0913635107
View details for Web of Science ID 000276159500010
View details for PubMedID 20231483
View details for PubMedCentralID PMC2851930
Reducing the Computational Complexity of Information Theoretic Approaches for Reconstructing Gene Regulatory Networks
JOURNAL OF COMPUTATIONAL BIOLOGY
2010; 17 (2): 169-176
Information theoretic approaches are increasingly being used for reconstructing regulatory networks from microarray data. These approaches start by computing the pairwise mutual information (MI) between all gene pairs. The resulting MI matrix is then manipulated to identify regulatory relationships. A barrier to these approaches is the time-consuming step of computing the MI matrix. We present a method to reduce this computation time. We apply spectral analysis to re-order the genes, so that genes that share regulatory relationships are more likely to be placed close to each other. Then, using a "sliding window" approach with appropriate window size and step size, we compute the MI for the genes within the sliding window, and the remainder is assumed to be zero. Using both simulated data and microarray data, we demonstrate that our method does not incur performance loss in regions of high-precision and low-recall, while the computational time is significantly lowered. The proposed method can be used with any method that relies on the mutual information to reconstruct networks.
View details for DOI 10.1089/cmb.2009.0052
View details for Web of Science ID 000279271200005
View details for PubMedID 20078227
View details for PubMedCentralID PMC3148830
Survival Analysis of Cancer Risk Reduction Strategies for BRCA1/2 Mutation Carriers
JOURNAL OF CLINICAL ONCOLOGY
2010; 28 (2): 222-231
Women with BRCA1/2 mutations inherit high risks of breast and ovarian cancer; options to reduce cancer mortality include prophylactic surgery or breast screening, but their efficacy has never been empirically compared. We used decision analysis to simulate risk-reducing strategies in BRCA1/2 mutation carriers and to compare resulting survival probability and causes of death.We developed a Monte Carlo model of breast screening with annual mammography plus magnetic resonance imaging (MRI) from ages 25 to 69 years, prophylactic mastectomy (PM) at various ages, and/or prophylactic oophorectomy (PO) at ages 40 or 50 years in 25-year-old BRCA1/2 mutation carriers.With no intervention, survival probability by age 70 is 53% for BRCA1 and 71% for BRCA2 mutation carriers. The most effective single intervention for BRCA1 mutation carriers is PO at age 40, yielding a 15% absolute survival gain; for BRCA2 mutation carriers, the most effective single intervention is PM, yielding a 7% survival gain if performed at age 40 years. The combination of PM and PO at age 40 improves survival more than any single intervention, yielding 24% survival gain for BRCA1 and 11% for BRCA2 mutation carriers. PM at age 25 instead of age 40 offers minimal incremental benefit (1% to 2%); substituting screening for PM yields a similarly minimal decrement in survival (2% to 3%).Although PM at age 25 plus PO at age 40 years maximizes survival probability, substituting mammography plus MRI screening for PM seems to offer comparable survival. These results may guide women with BRCA1/2 mutations in their choices between prophylactic surgery and breast screening.
View details for DOI 10.1200/JCO.2009.22.7991
View details for Web of Science ID 000273418000010
View details for PubMedID 19996031
Effects of Mammography Screening Under Different Screening Schedules: Model Estimates of Potential Benefits and Harms
ANNALS OF INTERNAL MEDICINE
2009; 151 (10): 738-W247
Despite trials of mammography and widespread use, optimal screening policy is controversial.To evaluate U.S. breast cancer screening strategies.6 models using common data elements.National data on age-specific incidence, competing mortality, mammography characteristics, and treatment effects.A contemporary population cohort.Lifetime.Societal.20 screening strategies with varying initiation and cessation ages applied annually or biennially.Number of mammograms, reduction in deaths from breast cancer or life-years gained (vs. no screening), false-positive results, unnecessary biopsies, and overdiagnosis.The 6 models produced consistent rankings of screening strategies. Screening biennially maintained an average of 81% (range across strategies and models, 67% to 99%) of the benefit of annual screening with almost half the number of false-positive results. Screening biennially from ages 50 to 69 years achieved a median 16.5% (range, 15% to 23%) reduction in breast cancer deaths versus no screening. Initiating biennial screening at age 40 years (vs. 50 years) reduced mortality by an additional 3% (range, 1% to 6%), consumed more resources, and yielded more false-positive results. Biennial screening after age 69 years yielded some additional mortality reduction in all models, but overdiagnosis increased most substantially at older ages.Varying test sensitivity or treatment patterns did not change conclusions.Results do not include morbidity from false-positive results, patient knowledge of earlier diagnosis, or unnecessary treatment.Biennial screening achieves most of the benefit of annual screening with less harm. Decisions about the best strategy depend on program and individual objectives and the weight placed on benefits, harms, and resource considerations. Primary Funding Source: National Cancer Institute.
View details for Web of Science ID 000272145100007
View details for PubMedID 19920274
Modeling the transition of lung cancer from early to advanced stage
CANCER CAUSES & CONTROL
2009; 20 (9): 1559-1569
We present a stochastic parametric model of the natural history of lung cancer that predicts the primary tumor volume at the moment the disease transits from early to advanced stage. Our model also produces estimates for the probability of symptomatic detection as a function of tumor volume and clinical stage. We estimate model parameters by likelihood maximization using data from the Mayo Lung Project (MLP), which was a clinical trial that evaluated screening for lung cancer in the 1970s. Mayo Lung Project cancer cases reported in Stage III or greater, according to the 1979 AJCC staging for lung cancer, were considered advanced stage. Our estimator distinguishes between the cases detected because of clinical symptoms and cases detected by screening. For nonsmall cell lung cancer cases detected in MLP, we estimate that the median primary tumor diameter at the onset of advanced stage disease was 4.1 cm. In addition, we estimate that the rate of patients symptomatically detected with their disease increases as their primary tumor increases in size, and for patients with a primary tumor of a given size, the rate of symptomatic detection is 12.8 times greater among patients with advanced stage disease compared to patients with early stage disease.
View details for DOI 10.1007/s10552-009-9401-4
View details for Web of Science ID 000271198400003
View details for PubMedID 19629730
Ly6d marks the earliest stage of B-cell specification and identifies the branchpoint between B-cell and T-cell development
GENES & DEVELOPMENT
2009; 23 (20): 2376-2381
Common lymphoid progenitors (CLPs) clonally produce both B- and T-cell lineages, but have little myeloid potential in vivo. However, some studies claim that the upstream lymphoid-primed multipotent progenitor (LMPP) is the thymic seeding population, and suggest that CLPs are primarily B-cell-restricted. To identify surface proteins that distinguish functional CLPs from B-cell progenitors, we used a new computational method of Mining Developmentally Regulated Genes (MiDReG). We identified Ly6d, which divides CLPs into two distinct populations: one that retains full in vivo lymphoid potential and produces more thymocytes at early timepoints than LMPP, and another that behaves essentially as a B-cell progenitor.
View details for DOI 10.1101/gad.1836009
View details for Web of Science ID 000270849700004
View details for PubMedID 19833765
View details for PubMedCentralID PMC2764492
Simultaneous Class Discovery and Classification of Microarray Data Using Spectral Analysis
JOURNAL OF COMPUTATIONAL BIOLOGY
2009; 16 (7): 935-944
Classification methods are commonly divided into two categories: unsupervised and supervised. Unsupervised methods have the ability to discover new classes by grouping data into clusters or tree structures without using the class labels, but they carry the risk of producing noninterpretable results. On the other hand, supervised methods always find decision rules that discriminate samples with different class labels. However, the class label information plays such an important role that it confines supervised methods by defining the possible classes. Consequently, supervised methods do not have the ability to discover new classes. To overcome the limitations of unsupervised and supervised methods, we propose a new method, which utilizes the class labels to a less important role so as to perform class discovery and classification simultaneously. The proposed method is called SPACC (SPectral Analysis for Class discovery and Classification). In SPACC, the training samples are nodes of an undirected weighted network. Using spectral analysis, SPACC iteratively partitions the network into a top-down binary tree. Each partitioning step is unsupervised, and the class labels are only used to define the stopping criterion. When the partitioning ends, the training samples have been divided into several subsets, each corresponding to one class label. Because multiple subsets can correspond to the same class label, SPACC may identify biologically meaningful subclasses, and minimize the impact of outliers and mislabeled data. We demonstrate the effectiveness of SPACC for class discovery and classification on microarray data of lymphomas and leukemias. SPACC software is available at http://icbp.stanford.edu/software/SPACC/.
View details for DOI 10.1089/cmb.2008.0227
View details for Web of Science ID 000268172700005
View details for PubMedID 19580522
Fast calculation of pairwise mutual information for gene regulatory network reconstruction
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE
2009; 94 (2): 177-180
We present a new software implementation to more efficiently compute the mutual information for all pairs of genes from gene expression microarrays. Computation of the mutual information is a necessary first step in various information theoretic approaches for reconstructing gene regulatory networks from microarray data. When the mutual information is estimated by kernel methods, computing the pairwise mutual information is quite time-consuming. Our implementation significantly reduces the computation time. For an example data set of 336 samples consisting of normal and malignant B-cells, with 9563 genes measured per sample, the current available software for ARACNE requires 142 hours to compute the mutual information for all gene pairs, whereas our algorithm requires 1.6 hours. The increased efficiency of our algorithm improves the feasibility of applying mutual information based approaches for reconstructing large regulatory networks.
View details for DOI 10.1016/j.cmpb.2008.11.003
View details for Web of Science ID 000264951300007
View details for PubMedID 19167129
- A Bayesian nonparametric method for model evaluation: application to genetic studies JOURNAL OF NONPARAMETRIC STATISTICS 2009; 21 (3): 379-396
Characterization of Patient Specific Signaling via Augmentation of Bayesian Networks with Disease and Patient State Nodes
Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society
IEEE. 2009: 6624–6627
Characterization of patient-specific disease features at a molecular level is an important emerging field. Patients may be characterized by differences in the level and activity of relevant biomolecules in diseased cells. When high throughput, high dimensional data is available, it becomes possible to characterize differences not only in the level of the biomolecules, but also in the molecular interactions among them. We propose here a novel approach to characterize patient specific signaling, which augments high throughput single cell data with state nodes corresponding to patient and disease states, and learns a Bayesian network based on this data. Features distinguishing individual patients emerge as downstream nodes in the network. We illustrate this approach with a six phospho-protein, 30,000 cell-per-patient dataset characterizing three comparably diagnosed follicular lymphoma, and show that our approach elucidates signaling differences among them.
View details for Web of Science ID 000280543605113
View details for PubMedID 19963681
View details for PubMedCentralID PMC3124088
Genomic and proteomic analysis reveals a threshold level of MYC required for tumor maintenance
2008; 68 (13): 5132-5142
MYC overexpression has been implicated in the pathogenesis of most types of human cancers. MYC is likely to contribute to tumorigenesis by its effects on global gene expression. Previously, we have shown that the loss of MYC overexpression is sufficient to reverse tumorigenesis. Here, we show that there is a precise threshold level of MYC expression required for maintaining the tumor phenotype, whereupon there is a switch from a gene expression program of proliferation to a state of proliferative arrest and apoptosis. Oligonucleotide microarray analysis and quantitative PCR were used to identify changes in expression in 3,921 genes, of which 2,348 were down-regulated and 1,573 were up-regulated. Critical changes in gene expression occurred at or near the MYC threshold, including genes implicated in the regulation of the G(1)-S and G(2)-M cell cycle checkpoints and death receptor/apoptosis signaling. Using two-dimensional protein analysis followed by mass spectrometry, phospho-flow fluorescence-activated cell sorting, and antibody arrays, we also identified changes at the protein level that contributed to MYC-dependent tumor regression. Proteins involved in mRNA translation decreased below threshold levels of MYC. Thus, at the MYC threshold, there is a loss of its ability to maintain tumorigenesis, with associated shifts in gene and protein expression that reestablish cell cycle checkpoints, halt protein translation, and promote apoptosis.
View details for DOI 10.1158/0008-5472.CAN-07-6192
View details for Web of Science ID 000257415300024
View details for PubMedID 18593912
Boolean implication networks derived from large scale, whole genome microarray datasets
2008; 9 (10)
We describe a method for extracting Boolean implications (if-then relationships) in very large amounts of gene expression microarray data. A meta-analysis of data from thousands of microarrays for humans, mice, and fruit flies finds millions of implication relationships between genes that would be missed by other methods. These relationships capture gender differences, tissue differences, development, and differentiation. New relationships are discovered that are preserved across all three species.
View details for Web of Science ID 000260587300020
View details for PubMedID 18973690
View details for PubMedCentralID PMC2760884
Extracting binary signals from microarray time-course data
NUCLEIC ACIDS RESEARCH
2007; 35 (11): 3705-3712
This article presents a new method for analyzing microarray time courses by identifying genes that undergo abrupt transitions in expression level, and the time at which the transitions occur. The algorithm matches the sequence of expression levels for each gene against temporal patterns having one or two transitions between two expression levels. The algorithm reports a P-value for the matching pattern of each gene, and a global false discovery rate can also be computed. After matching, genes can be sorted by the direction and time of transitions. Genes can be partitioned into sets based on the direction and time of change for further analysis, such as comparison with Gene Ontology annotations or binding site motifs. The method is evaluated on simulated and actual time-course data. On microarray data for budding yeast, it is shown that the groups of genes that change in similar ways and at similar times have significant and relevant Gene Ontology annotations.
View details for DOI 10.1093/nar/gkm284
View details for Web of Science ID 000247817500018
View details for PubMedID 17517782
Ductal pattern enhancement on magnetic resonance imaging of the breast due to ductal lavage
2007; 13 (3): 281-286
Our purpose is to describe the appearance of breast ductal enhancement found on magnetic resonance imaging (MRI) after breast ductal lavage (DL). We describe a novel etiology of enhancement in a ductal pattern on postcontrast MRI of the breast. Knowledge of the potential for breast MRI enhancement subsequent to DL, which can mimic the appearance of a pathologic lesion, is critical to the care of patients who undergo breast MRI and DL or other intraductal cannulation procedures.
View details for Web of Science ID 000245992200010
View details for PubMedID 17461903
A natural history model of stage progression applied to breast cancer
STATISTICS IN MEDICINE
2007; 26 (3): 581-595
Invasive breast cancer is commonly staged as local, regional or distant disease. We present a stochastic model of the natural history of invasive breast cancer that quantifies (1) the relative rate that the disease transitions from the local, regional to distant stages, (2) the tumour volume at the stage transitions and (3) the impact of symptom-prompted detection on the tumour size and stage of invasive breast cancer in a population not screened by mammography. By symptom-prompted detection, we refer to tumour detection that results when symptoms appear that prompt the patient to seek clinical care. The model assumes exponential tumour growth and volume-dependent hazard functions for the times to symptomatic detection and stage transitions. Maximum likelihood parameter estimates are obtained based on SEER data on the tumour size and stage of invasive breast cancer from patients who were symptomatically detected in the absence of screening mammography. Our results indicate that the rate of symptom-prompted detection is similar to the rate of transition from the local to regional stage and an order of magnitude larger than the rate of transition from the regional to distant stage. We demonstrate that, in the even absence of screening mammography, symptom-prompted detection has a large effect on reducing the occurrence of distant staged disease at initial diagnosis.
View details for DOI 10.1002/sim.2550
View details for Web of Science ID 000243511400009
View details for PubMedID 16598706
Cost-effectiveness of screening BRCA1/2 mutation carriers with breast magnetic resonance imaging
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION
2006; 295 (20): 2374-2384
Women with inherited BRCA1/2 mutations are at high risk for breast cancer, which mammography often misses. Screening with contrast-enhanced breast magnetic resonance imaging (MRI) detects cancer earlier but increases costs and results in more false-positive scans.To evaluate the cost-effectiveness of screening BRCA1/2 mutation carriers with mammography plus breast MRI compared with mammography alone.A computer model that simulates the life histories of individual BRCA1/2 mutation carriers, incorporating the effects of mammographic and MRI screening was used. The accuracy of mammography and breast MRI was estimated from published data in high-risk women. Breast cancer survival in the absence of screening was based on the Surveillance, Epidemiology and End Results database of breast cancer patients diagnosed in the prescreening period (1975-1981), adjusted for the current use of adjuvant therapy. Utilization rates and costs of diagnostic and treatment interventions were based on a combination of published literature and Medicare payments for 2005.The survival benefit, incremental costs, and cost-effectiveness of MRI screening strategies, which varied by ages of starting and stopping MRI screening, were computed separately for BRCA1 and BRCA2 mutation carriers.Screening strategies that incorporate annual MRI as well as annual mammography have a cost per quality-adjusted life-year (QALY) gained ranging from less than 45,000 dollars to more than 700,000 dollars, depending on the ages selected for MRI screening and the specific BRCA mutation. Relative to screening with mammography alone, the cost per QALY gained by adding MRI from ages 35 to 54 years is 55,420 dollars for BRCA1 mutation carriers, 130,695 dollars for BRCA2 mutation carriers, and 98,454 dollars for BRCA2 mutation carriers who have mammographically dense breasts.Breast MRI screening is more cost-effective for BRCA1 than BRCA2 mutation carriers. The cost-effectiveness of adding MRI to mammography varies greatly by age.
View details for Web of Science ID 000237734400023
View details for PubMedID 16720823
A stochastic simulation model of U.S. breast cancer mortality trends from 1975 to 2000.
Journal of the National Cancer Institute. Monographs
We present a simulation model that predicts U.S. breast cancer mortality trends from 1975 to 2000 and quantifies the impact of screening mammography and adjuvant therapy on these trends. This model was developed within the Cancer Intervention and Surveillance Network (CISNET) consortium.A Monte Carlo simulation is developed to generate the life history of individual breast cancer patients by using CISNET base case inputs that describe the secular trend in breast cancer risk, dissemination patterns for screening mammography and adjuvant treatment, and death from causes other than breast cancer. The model generates the patient's age, tumor size and stage at detection, mode of detection, age at death, and cause of death (breast cancer versus other) based in part on assumptions on the natural history of breast cancer. Outcomes from multiple birth cohorts are summarized in terms of breast cancer mortality rates by calendar year.Predicted breast cancer mortality rates follow the general shape of U.S. breast cancer mortality rates from 1975 to 1995 but level off after 1995 as opposed to following an observed decline. Sensitivity analysis revealed that the impact adjuvant treatment may be underestimated given the lack of data on temporal variation in treatment efficacy.We developed a simulation model that uses CISNET base case inputs and closely, but not exactly, reproduces U.S. breast cancer mortality rates. Screening mammography and adjuvant therapy are shown to have both contributed to a decline in U.S. breast cancer mortality.
View details for PubMedID 17032898
A comparative review of CISNET breast models used to analyze U.S. breast cancer incidence and mortality trends.
Journal of the National Cancer Institute. Monographs
The CISNET Breast Cancer program is a National Cancer Institute-sponsored collaboration composed of seven research groups that have modeled the impact of screening and adjuvant treatment on trends in breast cancer incidence and mortality over the period 1975-2000 (base case). This collaboration created a unique opportunity to make direct comparison of results from different models of population-based cancer screening produced in response to the same question. Comparing results in all but the most cursory way necessitates comparison of the models themselves. Previous chapters have discussed the models individual in detail. This chapter will aid the reader in understanding key areas of difference between the models. A focused analysis of differences and similarities between the models is presented with special attention paid to areas deemed most likely to contribute substantially to the results of the target analysis.
View details for PubMedID 17032899
Impact of adjuvant therapy and mammography on U.S. mortality from 1975 to 2000: comparison of mortality results from the cisnet breast cancer base case analysis.
Journal of the National Cancer Institute. Monographs
The CISNET breast cancer program is a consortium of seven research groups modeling the impact of various cancer interventions on the national trends of breast cancer incidence and mortality. Each of the modeling groups participated in a CISNET breast cancer base case analysis with the objective of assessing the impact of mammography and adjuvant therapy on breast cancer mortality between 1975 and 2000. The comparative modeling approach used to address this question allowed for a unique view into the process of modeling. Results shown here expand on those recently reported in the New England Journal of Medicine (Berry et al., N Engl J Med 2005;353:1784-92) by presenting mortality impact in several different ways to facilitate comparisons between models. Comparisons of each group's results in the context of modeling assumptions made during the process gave insight into how specific model assumptions may have affected the results. The median estimate for the percent decline in breast cancer mortality due to mammography was 15% (range of 8%-23%), and the median estimate for the percent decline in mortality due to adjuvant treatment was 19% (range of 12%-21%). A detailed discussion of the differences in modeling approaches and how those differences may have influenced the mortality results concludes the chapter.
View details for PubMedID 17032901
Effect of screening and adjuvant therapy on mortality from breast cancer
NEW ENGLAND JOURNAL OF MEDICINE
2005; 353 (17): 1784-1792
We used modeling techniques to assess the relative and absolute contributions of screening mammography and adjuvant treatment to the reduction in breast-cancer mortality in the United States from 1975 to 2000.A consortium of investigators developed seven independent statistical models of breast-cancer incidence and mortality. All seven groups used the same sources to obtain data on the use of screening mammography, adjuvant treatment, and benefits of treatment with respect to the rate of death from breast cancer.The proportion of the total reduction in the rate of death from breast cancer attributed to screening varied in the seven models from 28 to 65 percent (median, 46 percent), with adjuvant treatment contributing the rest. The variability across models in the absolute contribution of screening was larger than it was for treatment, reflecting the greater uncertainty associated with estimating the benefit of screening.Seven statistical models showed that both screening mammography and treatment have helped reduce the rate of death from breast cancer in the United States.
View details for Web of Science ID 000232813000006
View details for PubMedID 16251534
- Decision analysis and simulation modeling for evaluating diagnostic tests on the basis of patient outcomes. AMERICAN JOURNAL OF ROENTGENOLOGY 2005; 185 (3): 581-590
The effect of age, race, tumor size, tumor grade, and disease stage on invasive ductal breast cancer survival in the USSEER database
BREAST CANCER RESEARCH AND TREATMENT
2005; 89 (1): 47-54
To examine the effect of patient and tumor characteristics on breast cancer survival as recorded in the U.S. National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) database from 1973 to 1998.A sample of 72,367 female cases from 1973 to 1998 aged 21-90 years with invasive ductal breast cancer were examined with Cox proportional hazards regression to determine the effect of age at diagnosis, race, tumor size, tumor grade, disease stage, and year of diagnosis on disease-specific survival.Larger tumor size and higher tumor grade were found to have large negative effects on survival. Blacks had a 47 % greater risk of death than whites. Year of diagnosis had a positive effect, with a 15 % reduction in risk for each decade in the time period under study. The effects of patient age and disease stage violated the proportional hazards assumption, with distant disease having much poorer short-term survival than one would expect from a proportional hazards model, and younger age groups matching or even falling below the survival rate of the oldest group over time.Tumor size, grade, race, and year of diagnosis all have significant constant effects on disease-specific survival in breast cancer, while the effects of age at diagnosis and disease stage have significant effects that vary over time.
View details for Web of Science ID 000227280200007
View details for PubMedID 15666196
Breast magnetic resonance image screening and ductal lavage in women at high genetic risk for breast carcinoma
2004; 100 (3): 479-489
Intensive screening is an alternative to prophylactic mastectomy in women at high risk for developing breast carcinoma. The current article reports preliminary results from a screening protocol using high-quality magnetic resonance imaging (MRI), ductal lavage (DL), clinical breast examination, and mammography to identify early malignancy and high-risk lesions in women at increased genetic risk of breast carcinoma.Women with inherited BRCA1 or BRCA2 mutations or women with a >10% risk of developing breast carcinoma at 10 years, as estimated by the Claus model, were eligible. Patients were accrued from September 2001 to May 2003. Enrolled patients underwent biannual clinical breast examinations and annual mammography, breast MRI, and DL.Forty-one women underwent an initial screen. Fifteen of 41 enrolled women (36.6%) either had undergone previous bilateral oophorectomy and/or were on tamoxifen at the time of the initial screen. One patient who was a BRCA1 carrier had high-grade ductal carcinoma in situ (DCIS) that was screen detected by MRI but that was missed on mammography. High-risk lesions that were screen detected by MRI in three women included radial scars and atypical lobular hyperplasia. DL detected seven women with cellular atypia, including one woman who had a normal MRI and mammogram.Breast MRI identified high-grade DCIS and high-risk lesions that were missed by mammography. DL detected cytologic atypia in a high-risk cohort. A larger screening trial is needed to determine which subgroups of high-risk women will benefit and whether the identification of malignant and high-risk lesions at an early stage will impact breast carcinoma incidence and mortality.
View details for DOI 10.1002/cncr.11926
View details for Web of Science ID 000188611400006
View details for PubMedID 14745863
Simulation-based parameter estimation for complex models: a breast cancer natural history modelling illustration
STATISTICAL METHODS IN MEDICAL RESEARCH
2004; 13 (6): 507-524
Simulation-based parameter estimation offers a powerful means of estimating parameters in complex stochastic models. We illustrate the application of these ideas in the setting of a natural history model for breast cancer. Our model assumes that the tumor growth process follows a geometric Brownian motion; parameters are estimated from the SEER registry. Our discussion focuses on the use of simulation for computing the maximum likelihood estimator for this class of models. The analysis shows that simulation provides a straightforward means of computing such estimators for models of substantial complexity.
View details for DOI 10.1191/0962280204sm380ra
View details for Web of Science ID 000225102100006
View details for PubMedID 15587436
Diversity of model approaches for breast cancer screening: a review of model assumptions by The Cancer Intervention and Surveillance Network (CISNET) Breast Cancer Groups
STATISTICAL METHODS IN MEDICAL RESEARCH
2004; 13 (6): 525-538
The National Cancer Institute-sponsored Cancer Intervention and Surveillance Network program on breast cancer is composed of seven research groups working largely independently to model the impact of screening and adjuvant therapy on breast cancer mortality trends in the US from 1975 to 2000. Each of the groups has chosen a different modeling methodology without purposeful attempt to be in contrast with each other. The seven groups have met biannually since November 2000 to discuss their methodology and results. This article investigates the differences in methodology. To facilitate this comparison, each of the groups submitted a description of their model into a uniformly structured web based 'model profiler'. Six of the seven models simulate a preclinical natural history that cannot be observed directly with parameters estimated from published evidence concerning screening and therapy effects. The remaining model regards published evidence on intervention effects as prior information and updates that with information from the US population in a Bayesian type analysis. In general, the differences between the models appear to be small, particularly among the models driven by natural history assumptions. However, we demonstrate that such apparently small differences can have a large impact on surveillance of population trends. We describe a systematic approach to evaluating differences in model assumptions and results, as well as differences in modeling culture underlying the differences in model structure and parameters.
View details for DOI 10.1191/0962280204sm381ra
View details for Web of Science ID 000225102100007
View details for PubMedID 15587437
SPECTRAL EXTRAPOLATION OF SPATIALLY BOUNDED IMAGES
IEEE TRANSACTIONS ON MEDICAL IMAGING
1995; 14 (3): 487-497
A spectral extrapolation algorithm for spatially bounded images is presented. An image is said to be spatially bounded when it is confined to a closed region and is surrounded by a background of zeros. With prior knowledge of the spatial domain zeros, the extrapolation algorithm extends the image's spectrum beyond a known interval of low-frequency components. The result, which is referred to as the finite support solution, has space variant resolution; features near the edge of the support region are better resolved than those in the center. The resolution of the finite support solution is discussed as a function of the number of known spatial zeros and known spectral components. A regularized version of the finite support solution is included for handling the case where the known spectral components are noisy. For both the noiseless and noisy cases, the resolution of the finite support solution is measured in terms of its impulse response characteristics, and compared to the resolution of the zerofilled and Nyquist solutions. The finite support solution is superior to the zerofilled solution for both the noisy and noiseless data cases. When compared to the Nyquist solution, the finite support solution may be preferred in the noisy data case. Examples using medical image data are provided.
View details for Web of Science ID A1995RU69200009
View details for PubMedID 18215853
ALTERNATIVE K-SPACE SAMPLING DISTRIBUTIONS FOR MR SPECTROSCOPIC IMAGING
1994 IEEE International Conference on Image Processing (ICIP-94)
IEEE COMPUTER SOC. 1994: 11–14
View details for Web of Science ID A1994BC13E00003
RESOLUTION IMPROVEMENT FOR INVIVO MAGNETIC-RESONANCE SPECTROSCOPIC IMAGES
CONF ON MEDICAL IMAGING 5 : IMAGE PROCESSING
SPIE - INT SOC OPTICAL ENGINEERING. 1991: 118–127
View details for Web of Science ID A1991BT62G00014