Olshen's research is in statistics and their applications to medicine and biology. Many efforts have concerned tree-structured algorithms for classification, regression, survival analysis, and clustering. Those for classification have been used with success in computer-aided diagnosis and prognosis, while those for clustering have been applied to lossy data compression in digital radiography. Modeling and sample reuse methods have been developed for longitudinal data, concerning gait analysis; renal physiology; cholesterol; nephrophysiology; and recently, molecular genetics.

Academic Appointments

Administrative Appointments

  • Director, Laboratory for Mathematics and Statistics, University of California, San Diego (1982 - 1989)
  • Director, Biostatistics Unit, UCSD Cancer Center (1978 - 1989)
  • Chief, Division of Biostatistics, Department of Health Research and Policy, Stanford (1998 - Present)

Honors & Awards

  • Fellowship, John Simon Guggenheim Memorial Foundation (1987-88)
  • Fellow, Institute of Mathematical Statistics (1973)
  • Fellow, American Association for the Advancement of Science (1990)
  • Fellow, American Statistical Association (1996)
  • Fellow, Institute of Electrical and Electronics Engineers (IEEE) (2006)

Professional Education

  • Ph.D., Yale University, Statistics (1966)

Current Research and Scholarly Interests

2015-16 Courses

Graduate and Fellowship Programs

All Publications

  • Diversity and clonal selection in the human T-cell repertoire PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Qi, Q., Liu, Y., Cheng, Y., Glanville, J., Zhang, D., Lee, J., Olshen, R. A., Weyand, C. M., Boyd, S. D., Goronzy, J. J. 2014; 111 (36): 13139-13144
  • Insulin Resistance: Regression and Clustering PLOS ONE Yoon, S., Assimes, T. L., Quertermous, T., Hsiao, C., Chuang, L., Hwu, C., Rajaratnam, B., Olshen, R. A. 2014; 9 (6)
    In this paper we try to define insulin resistance (IR) precisely for a group of Chinese women. Our definition deliberately does not depend upon body mass index (BMI) or age, although in other studies, with particular random effects models quite different from models used here, BMI accounts for a large part of the variability in IR. We accomplish our goal through application of Gauss mixture vector quantization (GMVQ), a technique for clustering that was developed for application to lossy data compression. Defining data come from measurements that play major roles in medical practice. A precise statement of what the data are is in Section 1. Their family structures are described in detail. They concern levels of lipids and the results of an oral glucose tolerance test (OGTT). We apply GMVQ to residuals obtained from regressions of outcomes of an OGTT and lipids on functions of age and BMI that are inferred from the data. A bootstrap procedure developed for our family data supplemented by insights from other approaches leads us to believe that two clusters are appropriate for defining IR precisely. One cluster consists of women who are IR, and the other of women who seem not to be. Genes and other features are used to predict cluster membership. We argue that prediction with "main effects" is not satisfactory, but prediction that includes interactions may be.

    View details for DOI 10.1371/journal.pone.0094129

    View details for PubMedID 24887437

  • Assessing gene-level translational control from ribosome profiling BIOINFORMATICS Olshen, A. B., Hsieh, A. C., Stumpf, C. R., Olshen, R. A., Ruggero, D., Taylor, B. S. 2013; 29 (23): 2995-3002


    The translational landscape of diverse cellular systems remains largely uncharacterized. A detailed understanding of the control of gene expression at the level of messenger RNA translation is vital to elucidating a systems-level view of complex molecular programs in the cell. Establishing the degree to which such post-transcriptional regulation can mediate specific phenotypes is similarly critical to elucidating the molecular pathogenesis of diseases such as cancer. Recently, methods for massively parallel sequencing of ribosome-bound fragments of messenger RNA have begun to uncover genome-wide translational control at codon resolution. Despite its promise for deeply characterizing mammalian proteomes, few analytical methods exist for the comprehensive analysis of this paired RNA and ribosome data.We describe the Babel framework, an analytical methodology for assessing the significance of changes in translational regulation within cells and between conditions. This approach facilitates the analysis of translation genome-wide while allowing statistically principled gene-level inference. Babel is based on an errors-in-variables regression model that uses the negative binomial distribution and draws inference using a parametric bootstrap approach. We demonstrate the operating characteristics of Babel on simulated data and use its gene-level inference to extend prior analyses significantly, discovering new translationally regulated modules under mammalian target of rapamycin (mTOR) pathway signaling control.

    View details for DOI 10.1093/bioinformatics/btt533

    View details for Web of Science ID 000327508300006

    View details for PubMedID 24048356

  • Risk of Cardiovascular Disease from Antiretroviral Therapy for HIV: A Systematic Review PLOS ONE Bavinger, C., Bendavid, E., Niehaus, K., Olshen, R. A., Olkin, I., Sundaram, V., Wein, N., Holodniy, M., Hou, N., Owens, D. K., Desai, M. 2013; 8 (3)
    Recent studies suggest certain antiretroviral therapy (ART) drugs are associated with increases in cardiovascular disease.We performed a systematic review and meta-analysis to summarize the available evidence, with the goal of elucidating whether specific ART drugs are associated with an increased risk of myocardial infarction (MI).We searched Medline, Web of Science, the Cochrane Library, and abstract archives from the Conference on Retroviruses and Opportunistic Infections and International AIDS Society up to June 2011 to identify published articles and abstracts.Eligible studies were comparative and included MI, strokes, or other cardiovascular events as outcomes.Eligibility screening, data extraction, and quality assessment were performed independently by two investigators.Random effects methods and Fisher's combined probability test were used to summarize evidence.Twenty-seven studies met inclusion criteria, with 8 contributing to a formal meta-analysis. Findings based on two observational studies indicated an increase in risk of MI for patients recently exposed (usually defined as within last 6 months) to abacavir (RR 1.92, 95% CI 1.51-2.42) and protease inhibitors (PI) (RR 2.13, 95% CI 1.06-4.28). Our analysis also suggested an increased risk associated with each additional year of exposure to indinavir (RR 1.11, 95% CI 1.05-1.17) and lopinavir (RR 1.22, 95% CI 1.01-1.47). Our findings of increased cardiovascular risk from abacavir and PIs were in contrast to four published meta-analyses based on secondary analyses of randomized controlled trials, which found no increased risk from cardiovascular disease.Although observational studies implicated specific drugs, the evidence is mixed. Further, meta-analyses of randomized trials did not find increased risk from abacavir and PIs. Our findings that implicate specific ARTs in the observational setting provide sufficient evidence to warrant further investigation of this relationship in studies designed for that purpose.

    View details for DOI 10.1371/journal.pone.0059551

    View details for PubMedID 23555704

  • Successive Standardization: Application to Case-Control Studies TOPICS IN APPLIED STATISTICS Rajaratnam, B., Oh, S., Tsiang, M. T., Olshen, R. A. 2013; 55: 229-239
  • Persistence versus Reversion of 3TC Resistance in HIV-1 Determine the Rate of Emergence of NVP Resistance VIRUSES-BASEL Rath, B. A., Olshen, R. A., Halpern, J., Merigan, T. C. 2012; 4 (8): 1212-1234


    When HIV-1 is exposed to lamivudine (3TC) at inhibitory concentrations, resistant variants carrying the reverse transcriptase (RT) substitution M184V emerge rapidly. This substitution confers high-level 3TC resistance and increased RT fidelity. We established a novel in vitro system to study the effect of starting nevirapine (NVP) in 3TC-resistant/NNRTI-naïve clinical isolates, and the impact of maintaining versus dropping 3TC pressure in this setting. Because M184V mutant HIV-1 seems hypersusceptible to adefovir (ADV), we also tested the effect of ADV pressure on the same isolates. We draw four conclusions from our experiments simulating combination therapy in vitro. (1) The presence of low-dose (1 ?M) 3TC prevented reversal to wild-type from an M184V mutant background. (2) Adding low-dose 3TC in the presence of NVP delayed the selection of NVP-associated mutations. (3) The presence of ADV, in addition to NVP, led to more rapid reversal to wild-type at position 184 than NVP alone. (4) ADV plus NVP selected for greater numbers of mutations than NVP alone. Inference about the "selection of mutation" is based on two statistical models, one at the viral level, more telling, and the other at the level of predominance of mutation within a population. Multidrug pressure experiments lend understanding to mechanisms of HIV resistance as they bear upon new treatment strategies.

    View details for DOI 10.3390/v4081212

    View details for Web of Science ID 000308213000003

    View details for PubMedID 23012621

  • Significance analysis of xMap cytokine bead arrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Won, J., Goldberger, O., Shen-Orr, S. S., Davis, M. M., Olshen, R. A. 2012; 109 (8): 2848-2853


    Highly multiplexed assays using antibody coated, fluorescent (xMap) beads are widely used to measure quantities of soluble analytes, such as cytokines and antibodies in clinical and other studies. Current analyses of these assays use methods based on standard curves that have limitations in detecting low or high abundance analytes. Here we describe SAxCyB (Significance Analysis of xMap Cytokine Beads), a method that uses fluorescence measurements of individual beads to find significant differences between experimental conditions. We show that SAxCyB outperforms conventional analysis schemes in both sensitivity (low fluorescence) and robustness (high variability) and has enabled us to find many new differentially expressed cytokines in published studies.

    View details for DOI 10.1073/pnas.1112599109

    View details for Web of Science ID 000300495100041

    View details for PubMedID 22323610

  • Successive Standardization of Rectangular Arrays. Algorithms Olshen, R. A., Rajaratnam, B. 2012; 5 (1): 98-112


    In this note we illustrate and develop further with mathematics and examples, the work on successive standardization (or normalization) that is studied earlier by the same authors in [1] and [2]. Thus, we deal with successive iterations applied to rectangular arrays of numbers, where to avoid technical difficulties an array has at least three rows and at least three columns. Without loss, an iteration begins with operations on columns: first subtract the mean of each column; then divide by its standard deviation. The iteration continues with the same two operations done successively for rows. These four operations applied in sequence completes one iteration. One then iterates again, and again, and again, … In [1] it was argued that if arrays are made up of real numbers, then the set for which convergence of these successive iterations fails has Lebesgue measure 0. The limiting array has row and column means 0, row and column standard deviations 1. A basic result on convergence given in [1] is true, though the argument in [1] is faulty. The result is stated in the form of a theorem here, and the argument for the theorem is correct. Moreover, many graphics given in [1] suggest that except for a set of entries of any array with Lebesgue measure 0, convergence is very rapid, eventually exponentially fast in the number of iterations. Because we learned this set of rules from Bradley Efron, we call it "Efron's algorithm". More importantly, the rapidity of convergence is illustrated by numerical examples.

    View details for PubMedID 23355926

  • SNPs and Other Features as They Predispose to Complex Disease: Genome-Wide Predictive Analysis of a Quantitative Phenotype for Hypertension PLOS ONE Won, J., Ehret, G., Chakravarti, A., Olshen, R. A. 2011; 6 (11)


    Though recently they have fallen into some disrepute, genome-wide association studies (GWAS) have been formulated and applied to understanding essential hypertension. The principal goal here is to use data gathered in a GWAS to gauge the extent to which SNPs and their interactions with other features can be combined to predict mean arterial blood pressure (MAP) in 3138 pre-menopausal and naturally post-menopausal white women. More precisely, we quantify the extent to which data as described permit prediction of MAP beyond what is possible from traditional risk factors such as blood cholesterol levels and glucose levels. Of course, these traditional risk factors are genetic, though typically not explicitly so. In all, there were 44 such risk factors/clinical variables measured and 377,790 single nucleotide polymorphisms (SNPs) genotyped. Data for women we studied are from first visit measurements taken as part of the Atherosclerotic Risk in Communities (ARIC) study. We begin by assessing non-SNP features in their abilities to predict MAP, employing a novel regression technique with two stages, first the discovery of main effects and next discovery of their interactions. The long list of SNPs genotyped is reduced to a manageable list for combining with non-SNP features in prediction. We adapted Efron's local false discovery rate to produce this reduced list. Selected non-SNP and SNP features and their interactions are used to predict MAP using adaptive linear regression. We quantify quality of prediction by an estimated coefficient of determination (R(2)). We compare the accuracy of prediction with and without information from SNPs.

    View details for DOI 10.1371/journal.pone.0027891

    View details for Web of Science ID 000298168100023

    View details for PubMedID 22140480

  • Parent-specific copy number in paired tumor-normal studies using circular binary segmentation BIOINFORMATICS Olshen, A. B., Bengtsson, H., Neuvial, P., Spellman, P. T., Olshen, R. A., Seshan, V. E. 2011; 27 (15): 2038-2046


    High-throughput techniques facilitate the simultaneous measurement of DNA copy number at hundreds of thousands of sites on a genome. Older techniques allow measurement only of total copy number, the sum of the copy number contributions from the two parental chromosomes. Newer single nucleotide polymorphism (SNP) techniques can in addition enable quantifying parent-specific copy number (PSCN). The raw data from such experiments are two-dimensional, but are unphased. Consequently, inference based on them necessitates development of new analytic methods.We have adapted and enhanced the circular binary segmentation (CBS) algorithm for this purpose with focus on paired test and reference samples. The essence of paired parent-specific CBS (Paired PSCBS) is to utilize the original CBS algorithm to identify regions of equal total copy number and then to further segment these regions where there have been changes in PSCN. For the final set of regions, calls are made of equal parental copy number and loss of heterozygosity (LOH). PSCN estimates are computed both before and after calling.The methodology is evaluated by simulation and on glioblastoma data. In the simulation, PSCBS compares favorably to established methods. On the glioblastoma data, PSCBS identifies interesting genomic regions, such as copy-neutral LOH.The Paired PSCBS method is implemented in an open-source R package named PSCBS, available on CRAN (

    View details for DOI 10.1093/bioinformatics/btr329

    View details for Web of Science ID 000292778700003

    View details for PubMedID 21666266

  • Five Blood Pressure Loci Identified by an Updated Genome-Wide Linkage Scan: Meta-Analysis of the Family Blood Pressure Program AMERICAN JOURNAL OF HYPERTENSION Simino, J., Shi, G., Kume, R., Schwander, K., Province, M. A., Gu, C. C., Kardia, S., Chakravarti, A., Ehret, G., Olshen, R. A., Turner, S. T., Ho, L., Zhu, X., Jaquish, C., Paltoo, D., Cooper, R. S., Weder, A., Curb, J. D., Boerwinkle, E., Hunt, S. C., Rao, D. C. 2011; 24 (3): 347-354


    A preliminary genome-wide linkage analysis of blood pressure in the Family Blood Pressure Program (FBPP) was reported previously. We harnessed the power and ethnic diversity of the final pooled FBPP dataset to identify novel loci for blood pressure thereby enhancing localization of genes containing less common variants with large effects on blood pressure levels and hypertension.We performed one overall and 4 race-specific meta-analyses of genome-wide blood pressure linkage scans using data on 4,226 African-American, 2,154 Asian, 4,229 Caucasian, and 2,435 Mexican-American participants (total N = 13,044). Variance components models were fit to measured (raw) blood pressure levels and two types of antihypertensive medication adjusted blood pressure phenotypes within each of 10 subgroups defined by race and network. A modified Fisher's method was used to combine the P values for each linkage marker across the 10 subgroups.Five quantitative trait loci (QTLs) were detected on chromosomes 6p22.3, 8q23.1, 20q13.12, 21q21.1, and 21q21.3 based on significant linkage evidence (defined by logarithm of odds (lod) score ?3) in at least one meta-analysis and lod scores ?1 in at least 2 subgroups defined by network and race. The chromosome 8q23.1 locus was supported by Asian-, Caucasian-, and Mexican-American-specific meta-analyses.The new QTLs reported justify new candidate gene studies. They may help support results from genome-wide association studies (GWAS) that fall in these QTL regions but fail to achieve the genome-wide significance.

    View details for DOI 10.1038/ajh.2010.238

    View details for Web of Science ID 000287386900018

    View details for DOI 10.1214/10-AOAS385

    Standard statistical techniques often require transforming data to have mean 0 and standard deviation 1. Typically, this process of "standardization" or "normalization" is applied across subjects when each subject produces a single number. High throughput genomic and financial data often come as rectangular arrays, where each coordinate in one direction concerns subjects, who might have different status (case or control, say); and each coordinate in the other designates "outcome" for a specific feature, for example "gene," "polymorphic site," or some aspect of financial profile. It may happen when analyzing data that arrive as a rectangular array that one requires BOTH the subjects and features to be "on the same footing." Thus, there may be a need to standardize across rows and columns of the rectangular matrix. There arises the question as to how to achieve this double normalization. We propose and investigate the convergence of what seems to us a natural approach to successive normalization, which we learned from colleague Bradley Efron. We also study the implementation of the method on simulated data and also on data that arose from scientific experimentation.

    View details for DOI 10.1214/09-AOS743

    View details for Web of Science ID 000277471000013

    View details for PubMedID 20473354

  • Th17 and Th1 T-Cell Responses in Giant Cell Arteritis CIRCULATION Deng, J., Younge, B. R., Olshen, R. A., Goronzy, J. J., Weyand, C. M. 2010; 121 (7): 906-U107


    In giant cell arteritis (GCA), vasculitic damage of the aorta and its branches is combined with a syndrome of intense systemic inflammation. Therapeutically, glucocorticoids remain the gold standard because they promptly and effectively suppress acute manifestations; however, they fail to eradicate vessel wall infiltrates. The effects of glucocorticoids on the systemic and vascular components of GCA are not understood.The immunoprofile of untreated and glucocorticoid-treated GCA was examined in peripheral blood and temporal artery biopsies with protein quantification assays, flow cytometry, quantitative real-time polymerase chain reaction, and immunohistochemistry. Plasma interferon-gamma and interleukin (IL)-17 and frequencies of interferon-gamma-producing and IL-17-producing T cells were markedly elevated before therapy. Glucocorticoid treatment suppressed the Th17 but not the Th1 arm in the blood and the vascular lesions. Analysis of monocytes/macrophages in the circulation and in temporal arteries revealed glucocorticoid-mediated suppression of Th17-promoting cytokines (IL-1beta, IL-6, and IL-23) but sparing of Th1-promoting cytokines (IL-12). In human artery-severe combined immunodeficiency mouse chimeras, in which patient-derived T cells cause inflammation of engrafted human temporal arteries, glucocorticoids were similarly selective in inhibiting Th17 cells and leaving Th1 cells unaffected.Two pathogenic pathways mediated by Th17 and Th1 cells contribute to the systemic and vascular manifestations of GCA. IL-17-producing Th17 cells are sensitive to glucocorticoid-mediated suppression, but interferon-gamma-producing Th1 responses persist in treated patients. Targeting steroid-resistant Th1 responses will be necessary to resolve chronic smoldering vasculitis. Monitoring Th17 and Th1 frequencies can aid in assessing disease activity in GCA.

    View details for DOI 10.1161/CIRCULATIONAHA.109.872903

    View details for Web of Science ID 000274797500011

    View details for PubMedID 20142449

  • New models and online calculator for predicting non-sentinel lymph node status in sentinel lymph node positive breast cancer patients Kohrt, H., Olshen, R., Bermas, H., GOODSON, W., Henry, S., Rouse, R., Bailey, L., Philben, V., Dirbas, F., Dunn, J., Johnson, D., Wapnir, I., Carlson, R., STOCKDALE, F., Hansen, N., JEFFREY, S. SPRINGER. 2008: 588-588
  • New models and online calculator for predicting non-sentinel lymph node status in sentinel lymph node positive breast cancer patients BMC CANCER Kohrt, H. E., Olshen, R. A., Bermas, H. R., Goodson, W. H., Wood, D. J., Henry, S., Rouse, R. V., Bailey, L., Philben, V. J., Dirbas, F. M., Dunn, J. J., Johnson, D. L., Wapnir, I. L., Carlson, R. W., Stockdale, F. E., Hansen, N. M., Jeffrey, S. S. 2008; 8


    View details for DOI 10.1186/1471-2407-8-66

    View details for Web of Science ID 000255935500001

    View details for PubMedID 18315887

  • Associations Among Multiple Markers and Complex Disease: Models, Algorithms, and Applications GENETIC DISSECTION OF COMPLEX TRAITS, 2ND EDITION Assimes, T. L., Olshen, A. B., Narasimhan, B., Olshen, R. A. 2008; 60: 437-464


    This chapter is a report on collaborations among its authors and others over many years. It devolves from our goal of understanding genes, their main and epistatic effects combined with interactions involving demographic and environmental features also, as together they predict genetically complex diseases. Thus, our goal is "association." Particular phenotypes of interest to us are hypertension, insulin resistance, angina, and myocardial infarction. Prediction of complex disease is notoriously difficult, though it would be made easier were we given strand-specific information on genotype. Unfortunately, with current technology, genotypic information comes to us "unphased." While obviously we have strand-specific information when genotype is homozygous, we do not have such information when genotype is heterozygous. To summarize, the ultimate goals of approaches we provide is to predict phenotype, typically untoward or not, within a specific window of time. Our approach is neither through linkage nor from finding haplotype frequencies per se.

    View details for DOI 10.1016/S0065-2660(07)00416-6

    View details for Web of Science ID 000280575900018

    View details for PubMedID 18358329

  • Orthopaedic surgery core curriculum: Foot and ankle reconstruction FOOT & ANKLE INTERNATIONAL Wadey, V. M., Halpern, J., Younger, A. S., Dev, P., Olshen, R. A., Walker, D. 2007; 28 (7): 831-837


    The purpose of this study was to develop a core curriculum for orthopaedic surgery and to conduct a national survey to assess the importance of 281 curriculum items. Attention was focused on 45 items pertaining to the foot and ankle.A 281-item curriculum was developed. A content review and cross-sectional survey of a random selection of orthopaedic surgeons with primary nonacademic affiliations was completed. Data were analyzed descriptively and quantitatively using histograms, modified Hotelling's T(2)-statistic, and the Benjamini-Hochberg procedure. Our analyses assumed that each respondent answered questions independently of the answers of any other respondent but that the answers to different questions by the same respondent might be dependent.Of the 156 orthopaedic surgeons contacted, 131 (86%) participated in this study. Eighty-two percent (37 of 45) of the items were ranked by respondents with an average mean score higher than 3.5/4.0 and 42 higher than 3.0/40, thus suggesting that 93% of the items are important or probably important to know by the end of residency (p

    View details for DOI 10.3113/FAI.2007.0831

    View details for Web of Science ID 000247849200010

    View details for PubMedID 17666177

  • An investigation of genome-wide associations of hypertension with microsatellite markers in the Family Blood Pressure Program (FBPP) HUMAN GENETICS Gu, C. C., Hunt, S. C., Kardia, S., Turner, S. T., Chakravarti, A., Schork, N., Olshen, R., Curb, D., Jaquish, C., Boerwinkle, E., Rao, D. C. 2007; 121 (5): 577-590


    The Family Blood Pressure Program (FBPP) has data on 387 microsatellite markers in 13,524 subjects from four major ethnic groups. We investigated genetic association with hypertension of the linkage markers. Family-based methods were used to test association of the 387 loci with resting blood pressures (BPs) [systolic blood pressure (SBP) and diastolic blood pressure (DBP)] and the hypertension status (HT). We applied a vote-counting approach to pool results across the three correlated traits, network samples, and ethnic groups to refine the selection of susceptibility loci. The association analyses captured signals missed by previous linkage scans. We found 71 loci associated with at least one of the three traits in at least one of the four ethnic groups at the significance level of 0.01. After validation across multiple samples and related traits, we identified by vote-counting 21 candidate loci for hypertension. Two loci, D3S2459 and D10S1412 confirmed findings in Network-specific linkage scans (GENOA and SAPPHIRe). Many of the candidate loci were reported by others in linkage to BPs, body weight, heart disease, and diabetes. We also observed frequent presence of quantitative trait loci (QTLs) involved in autoimmune and neurological disorders (e.g., NOD2). The vote-counting method of pooling results recognizes the potential that a gene may be involved in varying ways among different samples, which we believe is responsible for identifying genes in the less explored inflammatory pathways to hypertension.

    View details for DOI 10.1007/s00439-007-0349-8

    View details for Web of Science ID 000246272400006

    View details for PubMedID 17372766

  • Heritability of left ventricular mass in Japanese families living in Hawaii: the SAPPHIRe Study JOURNAL OF HYPERTENSION Assimes, T. L., Narasimhan, B., Seto, T. B., Yoon, S., Curb, J. D., Olshen, R. A., Quertermous, T. 2007; 25 (5): 985-992


    Established determinants of left ventricular (LV) mass explain only a modest fraction of its variability. Family studies to date suggest that a proportion of the unexplained variability can be accounted for by additive polygenic effects. An estimate of this proportion has not been reported previously in an East Asian population. The objective of this study was to estimate the heritability of LV mass in Japanese families living in Hawaii.We analyzed data by components of variance in a sample of 169 hypertensive families (n = 476 subjects) and, separately, in a population-based sample of 256 families (n = 501 subjects) participating in the Honolulu Heart Program.In multivariate models, established predictors of LV mass explained about half the total variance of LV mass. Using SOLAR, our estimates of the narrow sense heritability of LV mass ranged from 42.5% (SE 9.8, P < 0.0001) in our sample of hypertensive families to 60.6% (SE 11.7, P < 0.0001) in our population-based sample of families. Parametric bootstrap analyses confirmed that the inference for each sample was appropriate.Assuming the absence of shared familial environmental effects, close to half of the unexplained variance of LV mass in Japanese subjects living in Hawaii is genetic in nature. This estimate was observed in two independent samples. Therefore, the pursuit of novel genetic determinants of LV mass through either whole genome or candidate gene association studies of this population may be worthwhile. Such studies are certainly feasible.

    View details for Web of Science ID 000245741200015

    View details for PubMedID 17414662

  • Orthopaedic surgery core curriculum: the spine POSTGRADUATE MEDICAL JOURNAL Wadey, V. M., Halpern, J., Bouchard, J., Dev, P., Olshen, R. A., Walker, D. 2007; 83 (978): 268-272


    To develop a core curriculum for orthopaedic surgery and to conduct a national survey to assess the importance of 281 items in the curriculum. Attention was focused specifically on 24 items pertaining to the curriculum that are pertinent to the spine.A cross-sectional survey of a random sample of orthopaedic surgeons whose primary affiliation was non-academic, representing the provinces and territories of CanadaA questionnaire containing 281 items was developed. A random group of 131 (out of 156) orthopaedic surgeons whose primary affiliation is non-academic completed the questionnaire. The data were analysed quantitatively using average mean scores, histograms, the modified Hotelling's T2 test and the Benjimini-Hochberg procedure.131 of 156 (84%) orthopaedic surgeons participated, in this study. 14 of 24 items were ranked at no less than 3 out of 4 thus suggesting that 58% of the items are important or probably important to know by the end of residency (SD< or =0.07). Residents need to learn the diagnosis and principles of managing patients with common conditions of the spine.The study shows, with reliable statistical evidence, that orthopaedic residents are no longer expected to be able to perform spinal fusions with proficiency on completion of residency. Is the exposure to surgical spine problems and the ability to be comfortable with operating expectations specific to the fellowship level? If so, the focus during residency or increasing accredited spine fellowships needs to be addressed to ensure that enough spine surgeons are educated to meet the future healthcare demands projected for Canada.

    View details for DOI 10.1136/pgmj.2006.053900

    View details for Web of Science ID 000245394400012

    View details for PubMedID 17403955

  • Canadian multidisciplinary core curriculum for musculoskeletal health JOURNAL OF RHEUMATOLOGY Wadey, V. M., Tang, E., Abelseth, G., Dev, P., Olshen, R. A., Walker, D. 2007; 34 (3): 567-580


    To determine the level of agreement among the Bone and Joint Decade Undergraduate Curriculum Group (BJDUCG) core curriculum recommendations for musculoskeletal (MSK) conditions targeted for undergraduate medical education and what the physicians and surgeons of Canada thought to be important at the postgraduate level of education.An 80-item questionnaire was developed. A cross-sectional survey of educators representing 77 Canadian accredited academic programs representing 6 disciplines in medicine that manage patients with MSK conditions was completed. Histograms, Kruskal-Wallis, and principal component analyses were computed.In total, 164/175 (94%) respondents participated in the study. All 80 curriculum items received a mean score of at least 3.0/4.0. Sixty-four out of 80 items were ranked to be at least 3.5/4.0, and 35 items were ranked to be at least 3.8/4.0, suggesting that these items may be core content for all disciplines.The World Health Organization declared the years 2000 to 2010 as The Bone and Joint Decade. The main goal is to improve the quality of life for people with MSK disorders worldwide. One aim of the BJD is to increase education of healthcare providers at all levels. The BJDUCG established a set of core curriculum recommendations for MSK conditions. Our study gives reliable statistical evidence of agreement among what the BJDUCG recommended for an MSK core curriculum for medical schools and what the physicians and surgeons of Canada thought to be important for residency education in several disciplines.

    View details for Web of Science ID 000244613800019

    View details for PubMedID 17183615

  • Tree-structured regression and the differentiation of integrals ANNALS OF STATISTICS Olshen, R. A. 2007; 35 (1): 1-12
  • Directly measured insulin resistance and the assessment of clustered cardiovascular risks in hypertension AMERICAN JOURNAL OF HYPERTENSION Lin, M., Hwu, C., Huang, Y., Sheu, W. H., Shih, K., Chiang, F., Olshen, R., Chen, Y. I., Curb, J. D., Rodriguez, B., Ho, L. 2006; 19 (11): 1118-1124


    The purpose of the study was to use factor analysis to investigate the contribution of a directly measured insulin sensitivity index, steady-state plasma glucose (SSPG) from insulin suppression test (IST), to a clustering of cardiovascular risk factors in hypertensive subjects.A total of 204 nondiabetic hypertensive patients who received IST for SSPG were included for current analysis. Factor analysis was performed to explore the contribution of SSPG as additional information to a clustering of risk factors in these subjects.In factor analysis, SSPG aggregated with metabolic variables in an obesity-hyperinsulinemia domain that included two factors: one with positive loadings for SSPG, 2-h glucose, and Log 2-h insulin; and the other with positive loadings for body mass index, waist circumference, and fasting glucose. Fasting insulin linked the two factors together and explained 38.3% of the total variance. Systolic and diastolic blood pressures were loaded on a blood pressure domain separately. The third domain consisted of two factors: one with positive loadings for Log triglycerides and negative loading for high-density lipoprotein cholesterol; and the other with positive loadings for Log triglycerides and non-high-density lipoprotein cholesterol. The model loaded without SSPG explained a proportion of the total variance (78.5%) similar to that achieved with the model loaded with SSPG (77.1%).Directly measured insulin sensitivity index SSPG clustered with 2-h glucose and Log 2-h insulin in factor analysis in a cohort consisting entirely of hypertensive subjects. However, the contribution of SSPG as additional information to explain the total variance seems to be insignificant.

    View details for DOI 10.1016/j.amjhyper.2006.04.003

    View details for Web of Science ID 000242142400006

    View details for PubMedID 17070421

  • Genome-wide linkage scans for loci affecting total cholesterol, HDL-C, and triglycerides: the Family Blood Pressure Program HUMAN GENETICS Bielinski, S. J., Tang, W., Pankow, J. S., Miller, M. B., Mosley, T. H., Boerwinkle, E., Olshen, R. A., Curb, J. D., Jaquish, C. E., Rao, D. C., Weder, A., Arnett, D. K. 2006; 120 (3): 371-380


    Atherosclerosis accounts for 75% of all deaths from cardiovascular disease and includes coronary heart disease (CHD), stroke, and other diseases of the arteries. More than half of all CHD is attributable to abnormalities in levels and metabolism of lipids. To locate genes that affect total cholesterol, high density lipoprotein cholesterol (HDL-C), and triglycerides, genome-wide linkage scans for quantitative trait loci were performed using variance components methods as implemented in SOLAR on a large diverse sample recruited as part of the Family Blood Pressure Program. Phenotype and genetic marker data were available for 9,299 subjects in 2,953 families for total cholesterol, 8,668 subjects in 2,736 families for HDL, and 7,760 subjects in 2,499 families for triglycerides. Mean lipid levels were adjusted for the effects of sex, age, age2, age-by-sex interaction, body mass index, smoking status, and field center. HDL-C and triglycerides were further adjusted for average total alcoholic drinks per week and estrogen use. Significant linkage was found for total cholesterol on chromosome 2 (LOD=3.1 at 43 cM) in Hispanics and for HDL-C on chromosome 3 (LOD=3.0 at 182 cM) and 12 (LOD=3.5 at 124 cM) in Asians. In addition, there were 13 regions that showed suggestive linkage (LOD >or= 2.0); 7 for total cholesterol, 4 for HDL, and 2 for triglycerides. The identification of these loci affecting lipid phenotypes and the apparent congruence with previous linkage results provides increased support that these regions contain genes influencing lipid levels.

    View details for DOI 10.1007/s00439-006-0223-0

    View details for Web of Science ID 000240613900005

    View details for PubMedID 16868761

  • Predicting non-sentinel lymph node involvement in breast cancer patients. Kohrt, H. E., Olshen, R. A., Goodson, W. H., Rouse, R. V., Bailey, L., Philben, V., Dirbas, F. M., Stockdale, F. E., Carlson, R. W., Jeffrey, S. S. AMER SOC CLINICAL ONCOLOGY. 2006: 10S-10S
  • Nonparametric supervised learning by linear interpolation with maximum entropy IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gupta, M. R., Gray, R. M., Olshen, R. A. 2006; 28 (5): 766-781


    Nonparametric neighborhood methods for learning entail estimation of class conditional probabilities based on relative frequencies of samples that are "near-neighbors" of a test point. We propose and explore the behavior of a learning algorithm that uses linear interpolation and the principle of maximum entropy (LIME). We consider some theoretical properties of the LIME algorithm: LIME weights have exponential form; the estimates are consistent; and the estimates are robust to additive noise. In relation to bias reduction, we show that near-neighbors contain a test point in their convex hull asymptotically. The common linear interpolation solution used for regression on grids or look-up-tables is shown to solve a related maximum entropy problem. LIME simulation results support use of the method, and performance on a pipeline integrity classification problem demonstrates that the proposed algorithm has practical value.

    View details for Web of Science ID 000235885700008

    View details for PubMedID 16640262

  • Systematic review: A century of inhalational anthrax cases from 1900 to 2005 ANNALS OF INTERNAL MEDICINE Holty, J. E., Bravata, D. M., Liu, H., Olshen, R. A., McDonald, K. M., Owens, D. K. 2006; 144 (4): 270-280


    Mortality from inhalational anthrax during the 2001 U.S. attack was substantially lower than that reported historically.To systematically review all published inhalational anthrax case reports to evaluate the predictors of disease progression and mortality.MEDLINE (1966-2005), 14 selected journal indexes (1900-1966), and bibliographies of all retrieved articles.Case reports (in any language) between 1900 and 2005 that met predefined criteria.Two authors (1 author for non-English-language reports) independently abstracted patient data.The authors found 106 reports of 82 cases of inhalational anthrax. Mortality was statistically significantly lower for patients receiving antibiotics or anthrax antiserum during the prodromal phase of disease, multidrug antibiotic regimens, or pleural fluid drainage. Patients in the 2001 U.S. attack were less likely to die than historical anthrax case-patients (45% vs. 92%; P < 0.001) and were more likely to receive antibiotics during the prodromal phase (64% vs. 13%; P < 0.001), multidrug regimens (91% vs. 50%; P = 0.027), or pleural fluid drainage (73% vs. 11%; P < 0.001). Patients who progressed to the fulminant phase had a mortality rate of 97% (regardless of the treatment they received), and all patients with anthrax meningoencephalitis died.This was a retrospective case review of previously published heterogeneous reports.Despite advances in supportive care, fulminant-phase inhalational anthrax is usually fatal. Initiation of antibiotic or anthrax antiserum therapy during the prodromal phase is associated with markedly improved survival, although other aspects of care, differences in clinical circumstances, or unreported factors may contribute to this observed reduction in mortality. Efforts to improve early diagnosis and timely initiation of appropriate antibiotics are critical to reducing mortality.

    View details for Web of Science ID 000235543100006

    View details for PubMedID 16490913

  • Modeling GFR trajectories in diabetic nephropathy AMERICAN JOURNAL OF PHYSIOLOGY-RENAL PHYSIOLOGY Lemley, K. V., Boothroyd, D. B., Blouch, K. L., Nelson, R. G., Jones, L. I., Olshen, R. A., Myers, B. D. 2005; 289 (4): F863-F870


    In an 8-year longitudinal study of Pima Indians with type 2 diabetes and nephropathy, we used statistical techniques that are novel and depend on minimal assumptions to compare longitudinal measurements of glomerular filtration rate (GFR). Individuals enrolled with new-onset microalbuminuria either progressed to macroalbuminuria (progressors, n = 13) or did not progress (nonprogressors, n = 13) during follow-up. Subjects with new-onset macroalbuminuria at screening were also followed (n = 22). Patients had their GFR determined serially by urinary iothalamate clearances (average 11 clearances; range 6-19). GFR courses of individuals were modeled using an adaptation of smoothing and regression cubic B-splines. Group comparisons were based on five-component vectors of fitted GFR values using a permutation approach to a Hotelling's T(2) statistic. GFR profiles of initially microalbuminuric progressors differed significantly from those of nonprogressors (P = 0.003). There were no significant baseline differences between progressors and nonprogressors with respect to any measured clinical parameters. The course of GFR in the first 4 yr following progression to macroalbuminuria in initially microalbuminuric subjects did not differ from that in newly screened macroalbuinuric subjects (P = 0.27). Without imposing simplifying models on the data, the statistical techniques used demonstrate that the courses of decline of GFR in definable subgroups of initially microalbuminuric diabetic Pima Indians, although generally progressive, follow distinct trajectories that are related to the extent of glomerular barrier dysfunction, as reflected by the evolution from microalbuminuria to macroalbuminuria.

    View details for DOI 10.1152/ajprenal.00068.2004

    View details for Web of Science ID 000231833300023

    View details for PubMedID 15900022

  • Vector quantization of amino acids: Analysis of the HIV V3 loop region JOURNAL OF STATISTICAL PLANNING AND INFERENCE Olsen, A. B., Cosman, P. C., Rodrigo, A. G., Bickel, P. J., Olshen, R. A. 2005; 130 (1-2): 277-298
  • Who is most likely to benefit from tPA? The perfusion-diffusion and clinical-diffusion mismatch models disagree Lansberg, M. G., Thijs, V. N., Bammer, R., Wechsler, L. R., O'Donnell, M. J., Olshen, R. A., Wijman, C. A., Kemp, S. M., Albers, G. W. LIPPINCOTT WILLIAMS & WILKINS. 2005: 437-437
  • Signature pattern of circulating chemokines can improve the identification of coronary artery disease Ardigo, D., Tabibiazar, R., Olshen, R., Tsao, P. S., Quetermous, T. SPRINGER. 2005: A406-A407
  • Single nucleotide polymorphisms in protein tyrosine phosphatase 1 beta (PTPN1) are associated with essential hypertension and obesity HUMAN MOLECULAR GENETICS Olivier, M., Hsiung, C. A., Chuang, L. M., Ho, L. T., Ting, C. T., Bustos, V. I., Lee, T. M., de Witte, A., CHEN, Y. D., Olshen, R., Rodriguez, B., Wen, C. C., Cox, D. R. 2004; 13 (17): 1885-1892


    Protein tyrosine phosphatase 1beta (PTP-1beta) is involved in the regulation of several important physiological pathways. It regulates both insulin and leptin signaling, and interacts with the epidermal- and platelet-derived growth factor receptors. The gene is located on human chromosome 20q13, and several rare single nucleotide polymorphisms (SNPs) have been shown to be associated with insulin resistance and diabetes in different populations. As part of our ongoing investigations into the genetic basis of hypertension, we examined common sequence variants in the gene for association with hypertension, obesity and altered lipid profile in two populations of Japanese and Chinese descent. We re-sequenced all exons, selected intronic sequences and the promoter region in 24 individuals from our cohort. Fourteen SNPs were discovered, and six of these spanning 78 kb were genotyped in 1553 individuals from 672 families. All six SNPs were in linkage disequilibrium, and we found strong association of common risk haplotypes with hypertension in Chinese and Japanese (P<0.0001). In addition, individual SNPs showed association to total plasma cholesterol, LDL-cholesterol and VLDL-cholesterol levels, as well as obesity measures (body mass index). This analysis supports that PTP-1beta affects plasma lipid levels, and may lead to obesity and hypertension in Japanese and Chinese. Given similar associations found in other populations to insulin resistance and diabetes, this gene may play a crucial role in the development of the characteristic metabolic changes seen in patients with the metabolic syndrome.

    View details for DOI 10.1093/hmg/ddh196

    View details for Web of Science ID 000223720100006

    View details for PubMedID 15229188

  • Canadian interdisciplinary core curriculum for musculoskeletal health - The rheumatologist perspective. Wadey, V. M., Tang, A., Olshen, R., Walker, D. WILEY-BLACKWELL. 2004: S475-S475
  • Tree-structured supervised learning and the genetics of hypertension Proc Natl Acad Sci USA Huang J, Lin A, Narasimhan B, Quertermous T, Hsiung CA, ... , Risch NJ, Olshen RA 2004; 101 (29): 10529-10534
  • Combined analysis of genomewide scans for adult height: results from the NHLBI family blood pressure program EUROPEAN JOURNAL OF HUMAN GENETICS Wu, X. D., Cooper, R. S., Boerwinkle, E., Turner, S. T., Hunt, S., Myers, R., Olshen, R. A., Curb, D., Zhu, X. F., Kan, D. H., Luke, A. 2003; 11 (3): 271-274


    A combined analysis of genome scans was performed for adult height in the NHLBI Family Blood Pressure Program. Height data were available on 6752 individuals. Linkage analysis was performed first separately for each of the eight ethnic groups in the four networks using the variance component method. To increase the power to detect the common genetic components affecting height for all the individuals, a linkage analysis was performed subsequently for the combined data set by pooling the average allele-sharing IBD () for all groups. By combining the data, we replicated evidence for a QTL influencing adult height on chromosome 7 (7q31) (LOD=2.46), which has been reported in two previous studies. Suggestive linkage (LOD>1) was found in another six regions in our combined analysis. Evidence for linkage for two of these regions (2p12, 20p11) has also been reported previously.

    View details for DOI 10.1038/sj.ejhg.5200952

    View details for Web of Science ID 000182189800010

    View details for PubMedID 12673281

  • Loss of GFR in type 2 diabetic Pima Indians with albuminuria detected at screening. Lemley, K. V., Boothroyd, D. B., Blouch, K., Nelson, R. G., Lois, J., Olshen, R. A., Myers, B. D. AMER SOC NEPHROLOGY. 2002: 645A-645A
  • Risk estimation for classification trees JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS Bloch, D. A., Olshen, R. A., Walker, M. G. 2002; 11 (2): 263-288
  • Genetic variation in aldosterone synthase predicts plasma glucose levels PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Ranade, K., Wu, K. D., Risch, N., Olivier, M., Pei, D., Hsiao, C. F., Chuang, L. M., Ho, L. T., Jorgenson, E., Pesich, R., CHEN, Y. D., Dzau, V., Lin, A., Olshen, R. A., Curb, D., Cox, D. R., Botstein, D. 2001; 98 (23): 13219-13224


    The mineralocorticoid hormone, aldosterone, is known to play a role in sodium homeostasis. We serendipitously found, however, highly significant association between single-nucleotide polymorphisms in the aldosterone synthase gene and plasma glucose levels in a large population of Chinese and Japanese origin. Two polymorphisms--one in the putative promoter (T-344C) and another resulting in a lysine/arginine substitution at amino acid 173, which are in complete linkage disequilibrium in this population--were associated with fasting plasma glucose levels (P = 0.000017) and those 60 (P = 0.017) and 120 (P = 0.0019) min after an oral glucose challenge. A C/T variant in intron 1, between these polymorphisms, was not associated with glucose levels. Arg-173 and -344C homozygotes were most likely to be diabetic [odds ratio 2.51; 95% confidence interval (C.I.) 1.39-3.92; P = 0.0015] and have impaired fasting glucose levels (odds ratio 3.53; 95% C.I. 2.02-5.5; P = 0.0000036). These results suggest a new role for aldosterone in glucose homeostasis.

    View details for Web of Science ID 000172076800070

    View details for PubMedID 11687612

  • A conversation with Leo Breiman STATISTICAL SCIENCE Olshen, R. 2001; 16 (2): 184-198
  • Multiresolution image classification by hierarchical modeling with two-dimensional hidden Markov models IEEE TRANSACTIONS ON INFORMATION THEORY Li, J., Gray, R. M., Olshen, R. A. 2000; 46 (5): 1826-1841
  • Intracellular reduction of selenite into glutathione peroxidase. Evidence for involvement of NADPH and not glutathione as the reductant MOLECULAR AND CELLULAR BIOCHEMISTRY Bhamre, S., Nuzzo, R. L., Whitin, J. C., Olshen, R. A., Cohen, H. J. 2000; 211 (1-2): 9-17


    Selenium (Se) in selenite is present in an oxidized state, and must be reduced for it to be incorporated as selenocysteine into selenoenzymes such as glutathione peroxidase (GPx). In vitro, Se, as in selenite, can be reduced utilizing glutathione (GSH) and glutathione reductase (GRed). We determined the effects of decreasing GSH levels, inhibiting GRed activity, and decreasing cellular NADPH on the selenite-dependent rate of GPx synthesis in cultured cells: PC3, CHO, and the E89 glucose-6-phosphate dehydrogenase (G-6-PD)-deficient cell line. A novel statistical analysis method was developed (using Box Cox transformed regression and a bootstrap method) in order to assess the effects of these manipulations singly and in combinations. Buthionine sulfoximine (BSO) was used to decrease GSH levels, 1,3 bis-(2 chloroethyl)-1 -nitrosourea (BCNU) was used to inhibit GRed activity and methylene blue (MB) was used to decrease cellular NADPH levels. This statistical method evaluates the effects of BSO, BCNU, MB and selenite alone and in combinations on GPx activity. Decreasing the GSH level (< 5% of control) did not have an effect on the selenite-dependent rate of GPx synthesis in PC3 or CHO cells, but did have a small inhibitory effect on the rate of GPx synthesis in E89 cells. Inhibiting GRed activity was also associated with either no effect (CHO, E89) or a small effect (PC3) on GPx activity. In contrast, decreasing NADPH levels in cells treated with MB was associated with a large decrease in the selenite-dependent rate of GPx synthesis to 36, 34 and 25% of control in PC3, CHO, and E89 cells, respectively. The effects of BSO plus BCNU were not synergistic in any of the cell lines. The effects of BSO plus MB were synergistic in G-6-PD-deficient E89 cells, but not in PC3 or CHO cells. We therefore conclude that under normal culture conditions, NADPH, and not glutathione, is the primary reductant of Se in selenite to forms that are eventually incorporated into GPx. For cells with abnormal ability to generate NADPH, lowering the GSH levels had a small effect on selenite-dependent GPx synthesis. GRed activity is not required for the selenite-dependent synthesis of GPx.

    View details for Web of Science ID 000089137800002

    View details for PubMedID 11055542

  • Do patient characteristics explain practice variability in the diagnosis and treatment of febrile infants? Bergman, D. A., PANTELL, R. H., Lin, A., Mayer, M., Olshen, R., Wasserman, R. C. NATURE PUBLISHING GROUP. 2000: 174A-174A
  • Analysis of single nucleotide polymorphisms in candidate genes for hypertension and insulin resistance. Olivier, M., Sheu, H. H., Jeng, C. Y., Hsiao, C. F., Tseng, Y. Z., Ranade, K., CHEN, Y. D., Olshen, R., Curb, D., Pratt, R., Jarvis, N., Indig, M. D., Risch, N., Cox, D. R. CELL PRESS. 1999: A438-A438
  • Joint image compression and classification with vector quantization and a two dimensional hidden Markov model DCC '99 - DATA COMPRESSION CONFERENCE, PROCEEDINGS Li, J., Gray, R. M., Olshen, R. 1999: 23-32
  • Empirically defined health states for depression from the SF-12 HEALTH SERVICES RESEARCH Sugar, C. A., Sturm, R., Lee, T. T., Sherbourne, C. D., Olshen, R. A., Wells, K. B., Lenert, L. A. 1998; 33 (4): 911-928


    To define objectively and describe a set of clinically relevant health states that encompass the typical effects of depression on quality of life in an actual patient population. Our model was designed to facilitate the elicitation of patients' and the public's values (utilities) for outcomes of depression.From the depression panel of the Medical Outcomes Study. Data include scores on the 12-Item Short Form Health Survey (SF-12) as well as independently obtained diagnoses of depression for 716 patients. Follow-up information, one year after baseline, was available for 166 of these patients.We use k-means cluster analysis to group the patients according to appropriate dimensions of health derived from the SF-12 scores. Chi-squared and exact permutation tests are used to validate the health states thus obtained, by checking for baseline and longitudinal correlation of cluster membership and clinical diagnosis.We find, on the basis of a combination of statistical and clinical criteria, that six states are optimal for summarizing the range of health experienced by depressed patients. Each state is described in terms of a subject who is typical in a sense that is articulated with our cluster-analytic approach. In all of our models, the relationship between health state membership and clinical diagnosis is highly statistically significant. The models are also sensitive to changes in patients' clinical status over time.Cluster analysis is demonstrably a powerful methodology for forming clinically valid health states from health status data. The states produced are suitable for the experimental elicitation of preference and analyses of costs and utilities.

    View details for Web of Science ID 000076304100009

    View details for PubMedID 9776942

  • Medical image compression and vector quantization STATISTICAL SCIENCE Perlmutter, S. M., Cosman, P. C., Tseng, C. W., Olshen, R. A., Gray, R. M., Li, K. C., Bergin, C. J. 1998; 13 (1): 30-53
  • Vector quantization and density estimation COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS Gray, R. M., Olshen, R. A. 1998: 172-193
  • A criterion for model selection using minimum description length COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS Najmi, A., Olshen, R. A., Gray, R. M. 1998: 204-214
  • Management and lesion detection effects of lossy image compression on digitized mammograms DIGITAL MAMMOGRAPHY Betts, B. J., Aiyer, A., Li, J., Ikeda, D., Birdwell, R., Gray, R. M., Olshen, R. A. 1998; 13: 449-456
  • Quantization, classification, and density estimation for Kohonen's Gaussian mixture DCC '98 - DATA COMPRESSION CONFERENCE Gray, R. M., Perlmutter, K. O., Olshen, R. A. 1998: 63-72
  • Modeling of progressive glomerular injury in humans with lupus nephritis AMERICAN JOURNAL OF PHYSIOLOGY-RENAL PHYSIOLOGY Buckheit, J. B., Olshen, R. A., Blouch, K., Myers, B. D. 1997; 273 (1): F158-F169


    We studied glomerular function longitudinally for 36-120 mo in 21 patients undergoing treatment for diffuse, proliferative lupus nephritis. We determined glomerular filtration rate (GFR) and glomerular oncotic pressure (IIGC) and computed the two-kidney ultrafiltration coefficient (Kf) at 6- to 12-mo intervals. The relationships and cross talk among the three variables over time were then analyzed by eigenfunction regression and canonical correlations. We also performed a morphometric analysis of serial biopsies and computed single-nephron Kf in patent glomeruli at baseline and after 36-94 mo of follow-up. Patients were divisible into progressors (n = 12) or nonprogressors (n = 9) according to the presence or absence, respectively, of an irrevocable decline in GFR over time. Examination of longitudinal variables revealed GFR to be strongly related to Kf in all patients and inversely related to IIGC in progressors. By serial morphometric analysis we observed a threefold increase in the prevalence of global sclerosis in progressors but unchanged prevalence in nonprogressors. Whereas single-nephron Kf of remnant glomeruli increased to supernormal levels in nonprogressors, the absence of this compensatory phenomenon in progressors permitted GFR and Kf to decline in parallel with the declining number of functional glomeruli.

    View details for Web of Science ID A1997XK45000019

    View details for PubMedID 9249604

  • Image quality in lossy compressed digital mammograms SIGNAL PROCESSING Perlmutter, S. M., Cosman, P. C., Gray, R. M., Olshen, R. A., Ikeda, D., Adams, C. N., Betts, B. J., Williams, M. B., Perlmutter, K. O., Li, J., Aiyer, A., Fajardo, L., Birdwell, R. 1997; 59 (2): 189-210
  • Covariability of V3 loop amino acids AIDS RESEARCH AND HUMAN RETROVIRUSES Bickel, P. J., Cosman, P. C., Olshen, R. A., Spector, P. C., Rodrigo, A. G., Mullins, J. I. 1996; 12 (15): 1401-1411


    We reanalyzed for covariability a set of 308 human immunodeficiency virus type 1 (HIV-1) V3 loop amino acid sequences from the B envelope sequence subtype previously analyzed by Korber et al.,1 as well as a new set of 440 sequences that also included substantial numbers of sequences from subtypes A, D, and E. We used the measure employed by Korber et al., essentially the likelihood ratio statistic for independence, plus two additional measures as well as clade information to examine the new set and both data sets simultaneously. We set forth the following conclusions and observations. The eight most highly connected sites identified through these statistical approaches included all of the six residues previously shown to have determining roles in structure, immunologic recognition, virus phenotype, and host range; each of the seven pairs of covariant sites found by Korber were signaled by our additional two measures in the set of 308 sequences, although 2 or 3 dropped out of the examination of the set of 440 when the requirement of stringent significance was applied for some or all of the three tests, respectively; using the same criteria, a total of 20 (including 5 Korber et al. pairs) or a total of 6 (including 4 Korber et al. pairs) were found when the set of 440 was added. Several limitations to statistical analysis of this type of HIV sequence data were also noted. For example, the data sets were, by historical necessity, collected haphazardly. For example, it was not possible to separate substantially sized groups out according to time of or since infection, disease status, antiviral treatment, geography, etc. There was also an enormous "wealth of significance" within the data. For example, for one measure the 440 data set showed 233 of the 465 pairs of sites with a likelihood ratio statistic of < 0.001. Last, most sites had consensus amino acids in 80% or more of the sequences; hence, there was an absence of data on many combinations of amino acids. Given the observed linkage between sites shown to be covariable and those known to have critical biological function, the statistical approaches we and Korber et al. have outlined may find use in predicting critical structural features of HIV proteins as targets for therapeutic intervention.

    View details for Web of Science ID A1996VM09500003

    View details for PubMedID 8893048

  • Bayes risk weighted vector quantization with posterior estimation for image compression and classification IEEE TRANSACTIONS ON IMAGE PROCESSING Perlmutter, K. O., Perlmutter, S. M., Gray, R. M., Olshen, R. A., OEHLER, K. L. 1996; 5 (2): 347-360


    Classification and compression play important roles in communicating digital information. Their combination is useful in many applications, including the detection of abnormalities in compressed medical images. In view of the similarities of compression and low-level classification, it is not surprising that there are many similar methods for their design. Because some of these methods are useful for designing vector quantizers, it seems natural that vector quantization (VQ) is explored for the combined goal. We investigate several VQ-based algorithms that seek to minimize both the distortion of compressed images and errors in classifying their pixel blocks. These algorithms are investigated with both full search and tree-structured codes. We emphasize a nonparametric technique that minimizes both error measures simultaneously by incorporating a Bayes risk component into the distortion measure used for the design and encoding. We introduce a tree-structured posterior estimator to produce the class posterior probabilities required for the Bayes risk computation in this design. For two different image sources, we demonstrate that this system provides superior classification while maintaining compression close or superior to that of several other VQ-based designs, including Kohonen's (1992) "learning vector quantizer" and a sequential quantizer/classifier design.

    View details for Web of Science ID A1996TZ09900012

    View details for PubMedID 18285118

  • Termination and continuity of greedy growing for tree-structured vector quantizers IEEE TRANSACTIONS ON INFORMATION THEORY Nobel, A. B., Olshen, R. A. 1996; 42 (1): 191-205
  • Evaluating quality and utility of digital mammograms and lossy compressed digital mammograms Adams, D. N., Aiyer, A., Betts, B. J., Li, J., Cosman, P. C., Perlmutter, S. M., Williams, M., Perlmutter, K. O., Ikeda, D., Fajardo, L., Birdwell, R., Daniel, B. L., Rossiter, S., Olshen, R. A., Gray, R. M. ELSEVIER SCIENCE PUBL B V. 1996: 169-176
  • Text segmentation in mixed-mode images using classification trees and transform tree-structured vector quantization Perlmutter, K. O., Chaddha, N., Buckheit, J. B., Gray, R. M., Olshen, R. A. IEEE. 1996: 2231-2234
  • Termination and continuity of greedy growing for tree structured vector quantizers IEEE Transactions on Information Theory Olshen RA., Nobel AB 1996
  • Predictive vector quantization with ridge regression Nash, C. L., Olshen, R. A., Gray, R. M. I E E E, COMPUTER SOC PRESS. 1996: 310-319


    The authors use predictive pruned tree-structured vector quantization for the compression of medical images. Their goal is to obtain a high compression ratio without impairing the image quality, at least so far as diagnostic purposes are concerned. The authors use a priori knowledge of the class of images to be encoded to help them segment the images and thereby to reserve bits for diagnostically relevant areas. Moreover, the authors improve the quality of prediction and encoding in two additional ways: by increasing the memory of the predictor itself and by using ridge regression for prediction. The improved encoding scheme was tested via computer simulations on a set of mediastinal CT scans; results are compared with those obtained using a more conventional scheme proposed recently in the literature. There were remarkable improvements in both the prediction accuracy and the encoding quality, above and beyond what comes from the segmentation. Test images were encoded at 0.5 bit per pixel and less without any visible degradation for the diagnostically relevant region.

    View details for Web of Science ID A1995RA64500003

    View details for PubMedID 18290024

  • WAVELET SHRINKAGE - ASYMPTOPIA - DISCUSSION JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL Speckman, P. L., Marron, J. S., Silverman, B., Nason, G., Wang, K. M., Seifert, B., Gasser, T., EFROMOVITCH, S., Nussbaum, M., Wang, Y. Z., VONSACHS, R., Brillinger, D. R., Neumann, M. H., Hastie, T., Fan, J. Q., Antoniadis, A., Birge, L., CRELLIN, N. J., Martin, M. A., Doukhan, P., Engel, J., GEORGIEV, A. A., Liu, H., Good, I. J., Hall, P., Patil, P., Herrmann, E., Kolaczyk, E. D., LEPSKII, O. V., Mammen, E., Spokoiny, V. G., LUCIER, B., McCullagh, P., Moulin, P., Muller, H. G., Olshen, R. A., Tsybakov, A. B., Wahba, G., Walter, G. G., Tibshirani, R. 1995; 57 (2): 337-369
  • Evaluating quality and utility in digital mammography Gray, R. M., Olshen, R. A., Ikeda, D., Cosman, P. C., Perlmutter, S., Nash, C., PERLMUTTER, K. IEEE COMPUTER SOC. 1995: B5-B8
  • BAYES RISK WEIGHTED VECTOR QUANTIZATION WITH CART ESTIMATED CLASS POSTERIORS Perlmutter, K. O., Gray, R. M., Olshen, R. A., Perlmutter, S. M. IEEE. 1995: 2435-2438
  • PREDICTING HIGH-RISK CHOLESTEROL LEVELS INTERNATIONAL STATISTICAL REVIEW Garber, A. M., Olshen, R. A., Zhang, H. P., Venkatraman, E. S. 1994; 62 (2): 203-228


    Because each of very different treatments for Hodgkin's disease (HD) may result in a high rate of cure, attention is currently focused on toxicity. This prospective study was designed to assess the effects of mediastinal irradiation and bleomycin chemotherapy on pulmonary function.Patients were treated from 1980 to 1990 on randomized controlled trials at Stanford University. Pulmonary function was tested before treatment (baseline), early after treatment (< 15 months), and more than 36 months posttherapy. Treatment options in the 145 patients were grouped as I (mediastinal radiotherapy), II (mediastinal radiotherapy plus bleomycin), and III (bleomycin) for analyses of variance (ANOVAs). A variety of regression models were used to predict early and late effects on pulmonary function.A decrease in forced vital capacity (FVC) and diffusing capacity (DLCO) in the first 15 months after treatment followed by recovery after 36 months was observed for most patients. Patients who received mediastinal radiotherapy (RT) had a more pronounced reduction in pulmonary function and less complete recovery. Overall, 3 or more years after treatment, 32% of group I patients, 37% of group II patients, and 19% of group III patients had FVC values less than 80% of predicted, while only 7% of patients had a DLCO less than 80% of predicted. Linear regression identified baseline measurement as the only significant predictor of change in percent predicted FVC or DLCO; patients with higher baseline values had greater decrements after therapy. Mantle RT was the only significant treatment variable, predictive of FVC and DLCO within 15 months and FVC at 36 or more months. No patient experienced pulmonary toxicity severe enough to require hospitalization.This prospective analysis of pulmonary function after treatment for HD showed that mediastinal RT was the only treatment variable that achieved statistical significance. Although there were no significant interactions between mediastinal RT and bleomycin or Adriamycin (doxorubicin; Adria Laboratories, Columbus, OH) chemotherapy, the patient numbers were small after correction for mediastinal mass size and drug regimen such that an effect could have been missed. The mild reduction in pulmonary function should be factored into the overall assessment of morbidity risk for each of the potentially curative treatments included in this study. As with all reports of late effects, these data should be interpreted with respect to the population tested, details of the treatment administered, methods of measurement, and length of follow-up.

    View details for Web of Science ID A1994MW70600012

    View details for PubMedID 7509383

  • THORACIC CT IMAGES - EFFECT OF LOSSY IMAGE COMPRESSION ON DIAGNOSTIC-ACCURACY RADIOLOGY Cosman, P. C., Davidson, H. C., Bergin, C. J., Tseng, C. W., Moses, L. E., Riskin, E. A., Olshen, R. A., Gray, R. M. 1994; 190 (2): 517-524


    To evaluate the effects of lossy image (noninvertible) compression on diagnostic accuracy of thoracic computed tomographic images.Sixty images from patients with mediastinal adenopathy and pulmonary nodules were compressed to six different levels with tree-structured vector quantization. Three radiologists then used the original and compressed images for diagnosis. Unlike many previous receiver operating characteristic-based studies that used confidence rankings and binary detection tasks, this study examined the sensitivity and predictive value positive scores from nonbinary detection tasks.At the 5% significance level, there was no statistically significant difference in diagnostic accuracy of image assessment at compression rates of up to 9:1.The techniques presented for evaluation of image quality do not depend on the specific compression algorithm and provide a useful approach to evaluation of the benefits of any lossy image processing technique.

    View details for Web of Science ID A1994MW44400043

    View details for PubMedID 8284409

  • MEASUREMENT ACCURACY AS A MEASURE OF IMAGE QUALITY IN COMPRESSED MR CHEST SCANS Perlmutter, S. M., Tseng, C. W., Cosman, P. C., Li, K. C., Olshen, R. A., Gray, R. M. I E E E, COMPUTER SOC PRESS. 1994: 861-865
  • Evaluating quality of compressed medical images: SNR, subjective rating, and diagnostic accuracy Proceedings of the IEEE Olshen RA., Cosman PC, Gray RM 1994
  • Predicting high-risk cholesterol levels Intl. Stat. Rev. Olshen RA, Garber AM, Zhang H, Venkatraman ES. 1994
  • TREE-STRUCTURED VECTOR QUANTIZATION WITH REGION-BASED CLASSIFICATION Perlmutter, S. M., Perlmutter, K. O., Cosman, P. C., Riskin, E. A., Olshen, R. A., Gray, R. M. I E E E, COMPUTER SOC PRESS. 1992: 691-695


    Binary tree-structured statistical classification algorithms and properties of 56 model alkyl nucleophiles were brought to bear on two problems of experimental pharmacology and toxicology. Each rat of a learning sample of 745 was administered one compound and autopsied to determine the presence of duodenal ulcer or adrenal hemorrhagic necrosis. The cited statistical classification schemes were then applied to these outcomes and 67 features of the compounds to ascertain those characteristics that are associated with biologic activity. For predicting duodenal ulceration, dipole moment, melting point, and solubility in octanol are particularly important, while for predicting adrenal necrosis, important features include the number of sulfhydryl groups and double bonds. These methods may constitute inexpensive but powerful ways to screen untested compounds for possible organ-specific toxicity. Mechanisms for the etiology and pathogenesis of the duodenal and adrenal lesions are suggested, as are additional avenues for drug design.

    View details for Web of Science ID A1991FW76100074

    View details for PubMedID 2068109

  • TRAINING SEQUENCE SIZE AND VECTOR QUANTIZER PERFORMANCE Cosman, P. C., Perlmutter, K. O., Perlmutter, S. M., Olshen, R. A., Gray, R. M. I E E E, COMPUTER SOC PRESS. 1991: 434-438


    While diffuse large-cell lymphoma (DLCL) is considered to be highly curable with current therapy, treatment failures are observed even with intensive combination chemotherapy regimens. In order to study the prognostic significance of actual dose intensity of chemotherapy in DLCL, we retrospectively analyzed 115 previously untreated patients treated as Stanford between 1975 and 1986 with cyclophosphamide, Adriamycin (doxorubicin; Adria Laboratories, Columbus, OH), vincristine, and prednisone (CHOP), methotrexate, bleomycin, Adriamycin, cyclophosphamide, vincristine, and dexamethasone ([M]BACOD), or methotrexate, Adriamycin, cyclosphosphamide, vincristine, prednisone, and bleomycin (MACOP-B). The actual relative dose intensity (RDI), the amount of drug actually administered to each patient during the first 12 weeks of therapy, was calculated as standardized to CHOP and analyzed in addition to clinical factors prognostic for survival by univariate analysis. Multivariate recursive partitioning (tree-structured) survival analysis identified the actual RDI of Adriamycin greater than 75% as the single most important predictor of survival. A model incorporating the actual RDI of Adriamycin and performance status, in combination with serum lactate dehydrogenase (LDH) and extranodal disease, defined three overall prognostic groups of patients with respective 3-year survival rates of 89%, 63%, and 18%. The three prognostic groups remained distinct, even when restricted to complete responders. This model was also predictive of survival when dose intensity was analyzed relative to the optimum dose defined for each of the three regimens and when applied to a subgroup of patients aged 50 years or younger. We conclude that actual RDI is an important prognostic factor for survival in DLCL and that analysis of RDI early in the course of treatment may allow modification of the treatment plan.

    View details for Web of Science ID A1990DG84100004

    View details for PubMedID 2348230

  • Classification and Regression Trees Wadsworth, Belmont, CA Olshen RA, Breiman L, Friedman JH, Stone CJ 1984
  • CONSISTENT NONPARAMETRIC REGRESSION ANNALS OF STATISTICS Stone, C. J., Bickel, P. J., Breiman, L., Brillinger, D. R., BRUNK, H. D., PIERCE, D. A., Chernoff, H., COVER, T. M., Cox, D. R., Eddy, W. F., Hampel, F., Olshen, R. A., Parzen, E., Rosenblatt, M., Sacks, J., Wahba, G. 1977; 5 (4): 595-645