Jonathan Taylor
Professor of Statistics
2024-25 Courses
- Applied Statistics III
STATS 305C (Spr) - Introduction to Statistical Methods: Precalculus
PSYCH 10, STATS 160, STATS 60 (Win) -
Independent Studies (4)
- Independent Study
STATS 199 (Aut, Win, Spr) - Independent Study
STATS 299 (Aut, Win, Spr) - Industrial Research for Statisticians
STATS 398 (Aut, Win, Spr) - Research
STATS 399 (Aut, Win, Spr)
- Independent Study
-
Prior Year Courses
2023-24 Courses
- Introduction to Applied Statistics
STATS 191 (Spr) - Introduction to Statistical Learning
STATS 216 (Win) - Statistics Faculty Research Presentations
STATS 303 (Aut)
2022-23 Courses
- Data Mining and Analysis
STATS 202 (Aut) - Literature of Statistics
STATS 319 (Aut) - Statistics Faculty Research Presentations
STATS 303 (Aut)
2021-22 Courses
- Applied Statistics II
STATS 305B (Win) - Literature of Statistics
STATS 319 (Spr) - Statistics Faculty Research Presentations
STATS 303 (Aut)
- Introduction to Applied Statistics
Stanford Advisees
-
Doctoral Dissertation Reader (AC)
Disha Ghandwani, Rex Shen, Anav Sood, James Yang -
Doctoral Dissertation Advisor (AC)
Kevin Fry
All Publications
-
Exact selective inference with randomization
BIOMETRIKA
2024
View details for DOI 10.1093/biomet/asae019
View details for Web of Science ID 001244029100001
-
Approximate Selective Inference via Maximum Likelihood
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2022
View details for DOI 10.1080/01621459.2022.2081575
View details for Web of Science ID 000818684200001
-
The volume-of-tube method for Gaussian random fields with inhomogeneous variance
JOURNAL OF MULTIVARIATE ANALYSIS
2022; 188
View details for DOI 10.1016/j.jmva.2021.104819
View details for Web of Science ID 000759646700036
-
Reconstructing codependent cellular cross-talk in lung adenocarcinoma using REMI.
Science advances
2022; 8 (11): eabi4757
Abstract
Cellular cross-talk in tissue microenvironments is fundamental to normal and pathological biological processes. Global assessment of cell-cell interactions (CCIs) is not yet technically feasible, but computational efforts to reconstruct these interactions have been proposed. Current computational approaches that identify CCI often make the simplifying assumption that pairwise interactions are independent of one another, which can lead to reduced accuracy. We present REMI (REgularized Microenvironment Interactome), a graph-based algorithm that predicts ligand-receptor (LR) interactions by accounting for LR dependencies on high-dimensional, small-sample size datasets. We apply REMI to reconstruct the human lung adenocarcinoma (LUAD) interactome from a bulk flow-sorted RNA sequencing dataset, then leverage single-cell transcriptomics data to increase the cell type resolution and identify LR prognostic signatures among tumor-stroma-immune subpopulations. We experimentally confirmed colocalization of CTGF:LRP6 among malignant cell subtypes as an interaction predicted to be associated with LUAD progression. Our work presents a computational approach to reconstruct interactomes and identify clinically relevant CCIs.
View details for DOI 10.1126/sciadv.abi4757
View details for PubMedID 35302849
-
INTEGRATIVE METHODS FOR POST-SELECTION INFERENCE UNDER CONVEX CONSTRAINTS
ANNALS OF STATISTICS
2021; 49 (5): 2803-2824
View details for DOI 10.1214/21-AOS2057
View details for Web of Science ID 000730635300018
-
Survival Analysis on Rare Events Using Group-Regularized Multi-Response Cox Regression.
Bioinformatics (Oxford, England)
2021
Abstract
MOTIVATION: The prediction performance of Cox proportional hazard model suffers when there are only few uncensored events in the training data.RESULTS: We propose a Sparse-Group regularized Cox regression method to improve the prediction performance of large-scale and high-dimensional survival data with few observed events. Our approach is applicable when there is one or more other survival responses that 1. has a large number of observed events; 2. share a common set of associated predictors with the rare event response. This scenario is common in the UK Biobank (Sudlow et al., 2015) dataset where records for a large number of common and less prevalent diseases of the same set of individuals are available. By analyzing these responses together, we hope to achieve higher prediction performance than when they are analyzed individually. To make this approach practical for large-scale data, we developed an accelerated proximal gradient optimization algorithm as well as a screening procedure inspired by Qian et al. (2020).AVAILABILITY: https://github.com/rivas-lab/multisnpnet-Cox.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btab095
View details for PubMedID 33560296
-
Inferactive data analysis
SCANDINAVIAN JOURNAL OF STATISTICS
2020; 47 (1): 212–49
View details for DOI 10.1111/sjos.12425
View details for Web of Science ID 000513940900010
-
Selection-Corrected Statistical Inference for Region Detection With High-Throughput Assays.
Journal of the American Statistical Association
2019; 114 (527): 1351-1365
Abstract
Scientists use high-dimensional measurement assays to detect and prioritize regions of strong signal in spatially organized domain. Examples include finding methylation enriched genomic regions using microarrays, and active cortical areas using brain-imaging. The most common procedure for detecting potential regions is to group neighboring sites where the signal passed a threshold. However, one needs to account for the selection bias induced by this procedure to avoid diminishing effects when generalizing to a population. This paper introduces pin-down inference, a model and an inference framework that permit population inference for these detected regions. Pin-down inference provides non-asymptotic point and confidence interval estimators for the mean effect in the region that account for local selection bias. Our estimators accommodate non-stationary covariances that are typical of these data, allowing researchers to better compare regions of different sizes and correlation structures. Inference is provided within a conditional one-parameter exponential family per region, with truncations that match the selection constraints. A secondary screening-and-adjustment step allows pruning the set of detected regions, while controlling the false-coverage rate over the reported regions. We apply the method to genomic regions with differing DNA-methylation rates across tissue. Our method provides superior power compared to other conditional and non-parametric approaches.
View details for DOI 10.1080/01621459.2018.1498347
View details for PubMedID 36312875
View details for PubMedCentralID PMC9615469
-
Beyond a Binary Classification of Sex: An Examination of Brain Sex Differentiation, Psychopathology, and Genotype
JOURNAL OF THE AMERICAN ACADEMY OF CHILD AND ADOLESCENT PSYCHIATRY
2019; 58 (8): 787–98
View details for DOI 10.1016/j.jaac.2018.09.425
View details for Web of Science ID 000518530600008
-
Selection-Corrected Statistical Inference for Region Detection With High-Throughput Assays
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2019; 114 (527): 1351–65
View details for DOI 10.1080/01621459.2018.1498347
View details for Web of Science ID 000489742200035
-
Kinematic formula for heterogeneous Gaussian related fields
STOCHASTIC PROCESSES AND THEIR APPLICATIONS
2019; 129 (7): 2437–65
View details for DOI 10.1016/j.spa.2018.07.013
View details for Web of Science ID 000471088700008
-
Beyond a Binary Classification of Sex: An Examination of Brain Sex Differentiation, Psychopathology, and Genotype.
Journal of the American Academy of Child and Adolescent Psychiatry
2018
Abstract
OBJECTIVE: Sex differences in the brain are traditionally treated as binary. We present new evidence that a continuous measure of sex differentiation of the brain can explain sex differences in psychopathology. The degree of sex differentiated brain features (ie, features that are more common in one sex) may predispose individuals toward sex-biased psychopathology and may also be influenced by the genome. We hypothesized that individuals with a female-biased differentiation score would have greater female-biased psychopathology (internalizing symptoms, such as anxiety and depression), whereas individuals with a male-biased differentiation score would have greater male-biased psychopathology (externalizing symptoms, such as disruptive behaviors).METHOD: Using the Philadelphia Neurodevelopmental Cohort database acquired from database of Genotypes and Phenotypes, we calculated the sex differentiation measure, a continuous data-driven calculation of each individual's degree of sex differentiating features extracted from multimodal brain imaging data (Magnetic resonance imaging (MRI) /Diffusion MRI) from the imaged participants (n=866, 407F/459M).RESULTS: In males, higher differentiation scores were correlated with higher levels of externalizing symptoms (r=0.119, p=0.016). The differentiation measure reached genome-wide association study significance (p<5*10-8) in males with single nucleotide polymorphisms Chromsome5:rs111161632:RASGEF1C and Chromosome19:rs75918199:GEMIN7, and in females with Chromosome2:rs78372132:PARD3B and Chromosome15:rs73442006:HCN4.CONCLUSION: The sex differentiation measure provides an initial topography of quantifying male and female brain features. This demonstration that the sex of the human brain can be conceptualized on a continuum has implications for both the presentation of psychopathology and the relation of the brain with genetic variants that may be associated with brain differentiation.
View details for PubMedID 30768381
-
Convergence of the reach for a sequence of Gaussian-embedded manifolds
PROBABILITY THEORY AND RELATED FIELDS
2018; 171 (3-4): 1045–91
View details for DOI 10.1007/s00440-017-0801-1
View details for Web of Science ID 000438793600010
-
SELECTIVE INFERENCE WITH A RANDOMIZED RESPONSE
ANNALS OF STATISTICS
2018; 46 (2): 679–710
View details for DOI 10.1214/17-AOS1564
View details for Web of Science ID 000431125400009
-
Post-Selection Inference for ℓ1-Penalized Likelihood Models.
The Canadian journal of statistics = Revue canadienne de statistique
2018; 46 (1): 41-61
Abstract
We present a new method for post-selection inference for ℓ1 (lasso)-penalized likelihood models, including generalized regression models. Our approach generalizes the post-selection framework presented in Lee et al. (2013). The method provides p-values and confidence intervals that are asymptotically valid, conditional on the inherent selection done by the lasso. We present applications of this work to (regularized) logistic regression, Cox's proportional hazards model and the graphical lasso. We do not provide rigorous proofs here of the claimed results, but rather conceptual and theoretical sketches.
View details for DOI 10.1002/cjs.11313
View details for PubMedID 30127543
View details for PubMedCentralID PMC6097808
-
Post-selection inference for 1-penalized likelihood models
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE
2018; 46 (1): 41–61
View details for DOI 10.1002/cjs.11313
View details for Web of Science ID 000425130100004
-
A General Framework for Estimation and Inference From Clusters of Features
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2018; 113 (521): 280–93
View details for DOI 10.1080/01621459.2016.1246368
View details for Web of Science ID 000438960500030
-
Scalable methods for Bayesian selective inference
ELECTRONIC JOURNAL OF STATISTICS
2018; 12 (2): 2355–2400
View details for DOI 10.1214/18-EJS1452
View details for Web of Science ID 000460450800010
-
SELECTING THE NUMBER OF PRINCIPAL COMPONENTS: ESTIMATION OF THE TRUE RANK OF A NOISY MATRIX
ANNALS OF STATISTICS
2017; 45 (6): 2590–2617
View details for DOI 10.1214/16-AOS1536
View details for Web of Science ID 000418371600011
-
Asymptotics of Selective Inference
SCANDINAVIAN JOURNAL OF STATISTICS
2017; 44 (2): 480-499
View details for DOI 10.1111/sjos.12261
View details for Web of Science ID 000400985000009
-
Post-selection point and interval estimation of signal sizes in Gaussian samples
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE
2017; 45 (2): 128-148
View details for DOI 10.1002/cjs.11320
View details for Web of Science ID 000400027400001
-
Topological consistency via kernel estimation
BERNOULLI
2017; 23 (1): 288-328
View details for DOI 10.3150/15-BEJ744
View details for Web of Science ID 000389565500011
-
The Intrinsic geometry of some random manifolds
ELECTRONIC COMMUNICATIONS IN PROBABILITY
2017; 22
View details for DOI 10.1214/16-ECP4763
View details for Web of Science ID 000396606700001
-
Sparse Steinian Covariance Estimation
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
2017; 26 (2): 355-366
View details for DOI 10.1080/10618600.2016.1209117
View details for Web of Science ID 000400182800012
-
Communication-efficient Sparse Regression
JOURNAL OF MACHINE LEARNING RESEARCH
2017; 18
View details for Web of Science ID 000397018800001
-
High-dimensional regression adjustments in randomized experiments.
Proceedings of the National Academy of Sciences of the United States of America
2016
Abstract
We study the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information and show that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect. Our results considerably extend the range of settings where high-dimensional regression adjustments are guaranteed to provide valid inference about the population average treatment effect. We then propose cross-estimation, a simple method for obtaining finite-sample-unbiased treatment effect estimates that leverages high-dimensional regression adjustments. Our method can be used when the regression model is estimated using the lasso, the elastic net, subset selection, etc. Finally, we extend our analysis to allow for adaptive specification search via cross-validation and flexible nonparametric regression adjustments with machine-learning methods such as random forests or neural networks.
View details for PubMedID 27791165
-
EXACT POST-SELECTION INFERENCE, WITH APPLICATION TO THE LASSO
ANNALS OF STATISTICS
2016; 44 (3): 907-927
View details for DOI 10.1214/15-AOS1371
View details for Web of Science ID 000375175200001
-
Exact Post-Selection Inference for Sequential Regression Procedures
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2016; 111 (514): 600-614
View details for DOI 10.1080/01621459.2015.1108848
View details for Web of Science ID 000381326700012
-
INFERENCE IN ADAPTIVE REGRESSION VIA THE KAC-RICE FORMULA
ANNALS OF STATISTICS
2016; 44 (2): 743-770
View details for DOI 10.1214/15-AOS1386
View details for Web of Science ID 000372594300011
-
Statistical learning and selective inference
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2015; 112 (25): 7629-7634
Abstract
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"-searched for the strongest associations-means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
View details for DOI 10.1073/pnas.1507583112
View details for Web of Science ID 000356731300047
View details for PubMedID 26100887
View details for PubMedCentralID PMC4485109
-
Geographic and temporal trends in the molecular epidemiology and genetic mechanisms of transmitted HIV-1 drug resistance: an individual-patient- and sequence-level meta-analysis.
PLoS medicine
2015; 12 (4)
Abstract
Regional and subtype-specific mutational patterns of HIV-1 transmitted drug resistance (TDR) are essential for informing first-line antiretroviral (ARV) therapy guidelines and designing diagnostic assays for use in regions where standard genotypic resistance testing is not affordable. We sought to understand the molecular epidemiology of TDR and to identify the HIV-1 drug-resistance mutations responsible for TDR in different regions and virus subtypes.We reviewed all GenBank submissions of HIV-1 reverse transcriptase sequences with or without protease and identified 287 studies published between March 1, 2000, and December 31, 2013, with more than 25 recently or chronically infected ARV-naïve individuals. These studies comprised 50,870 individuals from 111 countries. Each set of study sequences was analyzed for phylogenetic clustering and the presence of 93 surveillance drug-resistance mutations (SDRMs). The median overall TDR prevalence in sub-Saharan Africa (SSA), south/southeast Asia (SSEA), upper-income Asian countries, Latin America/Caribbean, Europe, and North America was 2.8%, 2.9%, 5.6%, 7.6%, 9.4%, and 11.5%, respectively. In SSA, there was a yearly 1.09-fold (95% CI: 1.05-1.14) increase in odds of TDR since national ARV scale-up attributable to an increase in non-nucleoside reverse transcriptase inhibitor (NNRTI) resistance. The odds of NNRTI-associated TDR also increased in Latin America/Caribbean (odds ratio [OR] = 1.16; 95% CI: 1.06-1.25), North America (OR = 1.19; 95% CI: 1.12-1.26), Europe (OR = 1.07; 95% CI: 1.01-1.13), and upper-income Asian countries (OR = 1.33; 95% CI: 1.12-1.55). In SSEA, there was no significant change in the odds of TDR since national ARV scale-up (OR = 0.97; 95% CI: 0.92-1.02). An analysis limited to sequences with mixtures at less than 0.5% of their nucleotide positions-a proxy for recent infection-yielded trends comparable to those obtained using the complete dataset. Four NNRTI SDRMs-K101E, K103N, Y181C, and G190A-accounted for >80% of NNRTI-associated TDR in all regions and subtypes. Sixteen nucleoside reverse transcriptase inhibitor (NRTI) SDRMs accounted for >69% of NRTI-associated TDR in all regions and subtypes. In SSA and SSEA, 89% of NNRTI SDRMs were associated with high-level resistance to nevirapine or efavirenz, whereas only 27% of NRTI SDRMs were associated with high-level resistance to zidovudine, lamivudine, tenofovir, or abacavir. Of 763 viruses with TDR in SSA and SSEA, 725 (95%) were genetically dissimilar; 38 (5%) formed 19 sequence pairs. Inherent limitations of this study are that some cohorts may not represent the broader regional population and that studies were heterogeneous with respect to duration of infection prior to sampling.Most TDR strains in SSA and SSEA arose independently, suggesting that ARV regimens with a high genetic barrier to resistance combined with improved patient adherence may mitigate TDR increases by reducing the generation of new ARV-resistant strains. A small number of NNRTI-resistance mutations were responsible for most cases of high-level resistance, suggesting that inexpensive point-mutation assays to detect these mutations may be useful for pre-therapy screening in regions with high levels of TDR. In the context of a public health approach to ARV therapy, a reliable point-of-care genotypic resistance test could identify which patients should receive standard first-line therapy and which should receive a protease-inhibitor-containing regimen.
View details for DOI 10.1371/journal.pmed.1001810
View details for PubMedID 25849352
-
Geographic and Temporal Trends in the Molecular Epidemiology and Genetic Mechanisms of Transmitted HIV-1 Drug Resistance: An Individual-Patient- and Sequence-Level Meta-Analysis.
PLoS medicine
2015; 12 (4)
View details for DOI 10.1371/journal.pmed.1001810
View details for PubMedID 25849352
-
On model selection consistency of regularized M-estimators
ELECTRONIC JOURNAL OF STATISTICS
2015; 9 (1): 608-642
View details for DOI 10.1214/15-EJS1013
View details for Web of Science ID 000366268800019
-
A SIGNIFICANCE TEST FOR THE LASSO
ANNALS OF STATISTICS
2014; 42 (2): 413-468
View details for DOI 10.1214/13-AOS1175
View details for Web of Science ID 000336888400001
-
A SIGNIFICANCE TEST FOR THE LASSO.
Annals of statistics
2014; 42 (2): 413-468
Abstract
In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a [Formula: see text] distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than [Formula: see text] under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the [Formula: see text] penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties-adaptivity and shrinkage-and its null distribution is tractable and asymptotically Exp(1).
View details for DOI 10.1214/13-AOS1175
View details for PubMedID 25574062
View details for PubMedCentralID PMC4285373
-
A Generalized Least-Square Matrix Decomposition
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2014; 109 (505): 145-159
View details for DOI 10.1080/01621459.2013.852978
View details for Web of Science ID 000333787300012
-
Non-nucleoside reverse transcriptase inhibitor (NNRTI) cross-resistance: implications for preclinical evaluation of novel NNRTIs and clinical genotypic resistance testing
JOURNAL OF ANTIMICROBIAL CHEMOTHERAPY
2014; 69 (1): 12-20
Abstract
The introduction of two new non-nucleoside reverse transcriptase inhibitors (NNRTIs) in the past 5 years and the identification of novel NNRTI-associated mutations have made it necessary to reassess the extent of phenotypic NNRTI cross-resistance.We analysed a dataset containing 1975, 1967, 519 and 187 genotype-phenotype correlations for nevirapine, efavirenz, etravirine and rilpivirine, respectively. We used linear regression to estimate the effects of RT mutations on susceptibility to each of these NNRTIs.Sixteen mutations at 10 positions were significantly associated with the greatest contribution to reduced phenotypic susceptibility (≥10-fold) to one or more NNRTIs, including: 14 mutations at six positions for nevirapine (K101P, K103N/S, V106A/M, Y181C/I/V, Y188C/L and G190A/E/Q/S); 10 mutations at six positions for efavirenz (L100I, K101P, K103N, V106M, Y188C/L and G190A/E/Q/S); 5 mutations at four positions for etravirine (K101P, Y181I/V, G190E and F227C); and 6 mutations at five positions for rilpivirine (L100I, K101P, Y181I/V, G190E and F227C). G190E, a mutation that causes high-level nevirapine and efavirenz resistance, also markedly reduced susceptibility to etravirine and rilpivirine. K101H, E138G, V179F and M230L mutations, associated with reduced susceptibility to etravirine and rilpivirine, were also associated with reduced susceptibility to nevirapine and/or efavirenz.The identification of novel cross-resistance patterns among approved NNRTIs illustrates the need for a systematic approach for testing novel NNRTIs against clinical virus isolates with major NNRTI-resistance mutations and for testing older NNRTIs against virus isolates with mutations identified during the evaluation of a novel NNRTI.
View details for DOI 10.1093/jac/dkt316
View details for Web of Science ID 000328425400002
View details for PubMedID 23934770
View details for PubMedCentralID PMC3861329
-
DETECTING SPARSE CONE ALTERNATIVES FOR GAUSSIAN RANDOM FIELDS, WITH AN APPLICATION TO fMRI
STATISTICA SINICA
2013; 23 (4): 1629-1656
View details for DOI 10.5705/ss.2012-218s
View details for Web of Science ID 000339125900011
-
The geometry of least squares in the 21st century
BERNOULLI
2013; 19 (4): 1449-1464
View details for DOI 10.3150/12-BEJSP15
View details for Web of Science ID 000324346200017
-
RANDOM FIELDS AND THE GEOMETRY OF WIENER SPACE
ANNALS OF PROBABILITY
2013; 41 (4): 2724-2754
View details for DOI 10.1214/11-AOP730
View details for Web of Science ID 000322353200010
-
A LASSO FOR HIERARCHICAL INTERACTIONS
ANNALS OF STATISTICS
2013; 41 (3): 1111-1141
View details for DOI 10.1214/13-AOS1096
View details for Web of Science ID 000321847600003
-
A LASSO FOR HIERARCHICAL INTERACTIONS.
Annals of statistics
2013; 41 (3): 1111-1141
Abstract
We add a set of convex constraints to the lasso to produce sparse interaction models that honor the hierarchy restriction that an interaction only be included in a model if one or both variables are marginally important. We give a precise characterization of the effect of this hierarchy constraint, prove that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint. We distinguish between parameter sparsity-the number of nonzero coefficients-and practical sparsity-the number of raw variables one must measure to make a new prediction. Hierarchy focuses on the latter, which is more closely tied to important data collection concerns such as cost, time and effort. We develop an algorithm, available in the R package hierNet, and perform an empirical study of our method.
View details for DOI 10.1214/13-AOS1096
View details for PubMedID 26257447
View details for PubMedCentralID PMC4527358
-
Interpretable whole-brain prediction analysis with GraphNet
NEUROIMAGE
2013; 72: 304-321
Abstract
Multivariate machine learning methods are increasingly used to analyze neuroimaging data, often replacing more traditional "mass univariate" techniques that fit data one voxel at a time. In the functional magnetic resonance imaging (fMRI) literature, this has led to broad application of "off-the-shelf" classification and regression methods. These generic approaches allow investigators to use ready-made algorithms to accurately decode perceptual, cognitive, or behavioral states from distributed patterns of neural activity. However, when applied to correlated whole-brain fMRI data these methods suffer from coefficient instability, are sensitive to outliers, and yield dense solutions that are hard to interpret without arbitrary thresholding. Here, we develop variants of the Graph-constrained Elastic-Net (GraphNet), a fast, whole-brain regression and classification method developed for spatially and temporally correlated data that automatically yields interpretable coefficient maps (Grosenick et al., 2009b). GraphNet methods yield sparse but structured solutions by combining structured graph constraints (based on knowledge about coefficient smoothness or connectivity) with a global sparsity-inducing prior that automatically selects important variables. Because GraphNet methods can efficiently fit regression or classification models to whole-brain, multiple time-point data sets and enhance classification accuracy relative to volume-of-interest (VOI) approaches, they eliminate the need for inherently biased VOI analyses and allow whole-brain fitting without the multiple comparison problems that plague mass univariate and roaming VOI ("searchlight") methods. As fMRI data are unlikely to be normally distributed, we (1) extend GraphNet to include robust loss functions that confer insensitivity to outliers, (2) equip them with "adaptive" penalties that asymptotically guarantee correct variable selection, and (3) develop a novel sparse structured Support Vector GraphNet classifier (SVGN). When applied to previously published data (Knutson et al., 2007), these efficient whole-brain methods significantly improved classification accuracy over previously reported VOI-based analyses on the same data (Grosenick et al., 2008; Knutson et al., 2007) while discovering task-related regions not documented in the original VOI approach. Critically, GraphNet estimates fit to the Knutson et al. (2007) data generalize well to out-of-sample data collected more than three years later on the same task but with different subjects and stimuli (Karmarkar et al., submitted for publication). By enabling robust and efficient selection of important voxels from whole-brain data taken over multiple time points (>100,000 "features"), these methods enable data-driven selection of brain areas that accurately predict single-trial behavior within and across individuals.
View details for DOI 10.1016/j.neuroimage.2012.12.062
View details for Web of Science ID 000317166800030
View details for PubMedID 23298747
-
HIGH LEVEL EXCURSION SET GEOMETRY FOR NON-GAUSSIAN INFINITELY DIVISIBLE RANDOM FIELDS
ANNALS OF PROBABILITY
2013; 41 (1): 134-169
View details for DOI 10.1214/11-AOP738
View details for Web of Science ID 000315072000004
-
ROTATION AND SCALE SPACE RANDOM FIELDS AND THE GAUSSIAN KINEMATIC FORMULA
ANNALS OF STATISTICS
2012; 40 (6): 2910-2942
View details for DOI 10.1214/12-AOS1055
View details for Web of Science ID 000321845400006
-
Standardized Comparison of the Relative Impacts of HIV-1 Reverse Transcriptase (RT) Mutations on Nucleoside RT Inhibitor Susceptibility
ANTIMICROBIAL AGENTS AND CHEMOTHERAPY
2012; 56 (5): 2305-2313
Abstract
Determining the phenotypic impacts of reverse transcriptase (RT) mutations on individual nucleoside RT inhibitors (NRTIs) has remained a statistical challenge because clinical NRTI-resistant HIV-1 isolates usually contain multiple mutations, often in complex patterns, complicating the task of determining the relative contribution of each mutation to HIV drug resistance. Furthermore, the NRTIs have highly variable dynamic susceptibility ranges, making it difficult to determine the relative effect of an RT mutation on susceptibility to different NRTIs. In this study, we analyzed 1,273 genotyped HIV-1 isolates for which phenotypic results were obtained using the PhenoSense assay (Monogram, South San Francisco, CA). We used a parsimonious feature selection algorithm, LASSO, to assess the possible contributions of 177 mutations that occurred in 10 or more isolates in our data set. We then used least-squares regression to quantify the impact of each LASSO-selected mutation on each NRTI. Our study provides a comprehensive view of the most common NRTI resistance mutations. Because our results were standardized, the study provides the first analysis that quantifies the relative phenotypic effects of NRTI resistance mutations on each of the NRTIs. In addition, the study contains new findings on the relative impacts of thymidine analog mutations (TAMs) on susceptibility to abacavir and tenofovir; the impacts of several known but incompletely characterized mutations, including E40F, V75T, Y115F, and K219R; and a tentative role in reduced NRTI susceptibility for K64H, a novel NRTI resistance mutation.
View details for DOI 10.1128/AAC.05487-11
View details for Web of Science ID 000302790400015
View details for PubMedID 22330916
View details for PubMedCentralID PMC3346663
-
DEGREES OF FREEDOM IN LASSO PROBLEMS
ANNALS OF STATISTICS
2012; 40 (2): 1198-1232
View details for DOI 10.1214/12-AOS1003
View details for Web of Science ID 000307608000021
-
Strong rules for discarding predictors in lasso-type problems.
Journal of the Royal Statistical Society. Series B, Statistical methodology
2012; 74 (2): 245-266
Abstract
We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui and his colleagues have propose 'SAFE' rules, based on univariate inner products between each predictor and the outcome, which guarantee that a coefficient will be 0 in the solution vector. This provides a reduction in the number of variables that need to be entered into the optimization. We propose strong rules that are very simple and yet screen out far more predictors than the SAFE rules. This great practical improvement comes at a price: the strong rules are not foolproof and can mistakenly discard active predictors, i.e. predictors that have non-zero coefficients in the solution. We therefore combine them with simple checks of the Karush-Kuhn-Tucker conditions to ensure that the exact solution to the convex problem is delivered. Of course, any (approximate) screening method can be combined with the Karush-Kuhn-Tucker, conditions to ensure the exact solution; the strength of the strong rules lies in the fact that, in practice, they discard a very large number of the inactive predictors and almost never commit mistakes. We also derive conditions under which they are foolproof. Strong rules provide substantial savings in computational time for a variety of statistical optimization problems.
View details for DOI 10.1111/j.1467-9868.2011.01004.x
View details for PubMedID 25506256
View details for PubMedCentralID PMC4262615
-
Strong rules for discarding predictors in lasso-type problems
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
2012; 74: 245-266
Abstract
We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui and his colleagues have propose 'SAFE' rules, based on univariate inner products between each predictor and the outcome, which guarantee that a coefficient will be 0 in the solution vector. This provides a reduction in the number of variables that need to be entered into the optimization. We propose strong rules that are very simple and yet screen out far more predictors than the SAFE rules. This great practical improvement comes at a price: the strong rules are not foolproof and can mistakenly discard active predictors, i.e. predictors that have non-zero coefficients in the solution. We therefore combine them with simple checks of the Karush-Kuhn-Tucker conditions to ensure that the exact solution to the convex problem is delivered. Of course, any (approximate) screening method can be combined with the Karush-Kuhn-Tucker, conditions to ensure the exact solution; the strength of the strong rules lies in the fact that, in practice, they discard a very large number of the inactive predictors and almost never commit mistakes. We also derive conditions under which they are foolproof. Strong rules provide substantial savings in computational time for a variety of statistical optimization problems.
View details for DOI 10.1111/j.1467-9868.2011.01004.x
View details for Web of Science ID 000301286200004
View details for PubMedCentralID PMC4262615
-
THE SOLUTION PATH OF THE GENERALIZED LASSO
ANNALS OF STATISTICS
2011; 39 (3): 1335-1371
View details for DOI 10.1214/11-AOS878
View details for Web of Science ID 000293716500001
-
Algebraic Topology of Excursion Sets: A New Challenge
TOPOLOGICAL COMPLEXITY OF SMOOTH RANDOM FUNCTIONS: ECOLE D'ETE DE PROBABILITES DE SAINT-FLOUR XXXIX - 2009
2011; 2019: 107-114
View details for DOI 10.1007/978-3-642-19580-8_6
View details for Web of Science ID 000343974900007
-
The Gaussian Kinematic Formula
TOPOLOGICAL COMPLEXITY OF SMOOTH RANDOM FUNCTIONS: ECOLE D'ETE DE PROBABILITES DE SAINT-FLOUR XXXIX - 2009
2011; 2019: 59-85
View details for DOI 10.1007/978-3-642-19580-8_4
View details for Web of Science ID 000343974900005
-
On Applications: Topological Inference
TOPOLOGICAL COMPLEXITY OF SMOOTH RANDOM FUNCTIONS: ECOLE D'ETE DE PROBABILITES DE SAINT-FLOUR XXXIX - 2009
2011; 2019: 87-106
View details for DOI 10.1007/978-3-642-19580-8_5
View details for Web of Science ID 000343974900006
-
Gaussian Processes
TOPOLOGICAL COMPLEXITY OF SMOOTH RANDOM FUNCTIONS: ECOLE D'ETE DE PROBABILITES DE SAINT-FLOUR XXXIX - 2009
2011; 2019: 13-35
View details for DOI 10.1007/978-3-642-19580-8_2
View details for Web of Science ID 000343974900003
-
Some Geometry and Some Topology
TOPOLOGICAL COMPLEXITY OF SMOOTH RANDOM FUNCTIONS: ECOLE D'ETE DE PROBABILITES DE SAINT-FLOUR XXXIX - 2009
2011; 2019: 37-58
View details for DOI 10.1007/978-3-642-19580-8_3
View details for Web of Science ID 000343974900004
-
A statistician plays darts
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY
2011; 174: 213-226
View details for Web of Science ID 000285969600013
-
HIV-1 Protease Mutations and Protease Inhibitor Cross-Resistance
ANTIMICROBIAL AGENTS AND CHEMOTHERAPY
2010; 54 (10): 4253-4261
Abstract
The effects of many protease inhibitor (PI)-selected mutations on the susceptibility to individual PIs are unknown. We analyzed in vitro susceptibility test results on 2,725 HIV-1 protease isolates. More than 2,400 isolates had been tested for susceptibility to fosamprenavir, indinavir, nelfinavir, and saquinavir; 2,130 isolates had been tested for susceptibility to lopinavir; 1,644 isolates had been tested for susceptibility to atazanavir; 1,265 isolates had been tested for susceptibility to tipranavir; and 642 isolates had been tested for susceptibility to darunavir. We applied least-angle regression (LARS) to the 200 most common mutations in the data set and identified a set of 46 mutations associated with decreased PI susceptibility of which 40 were not polymorphic in the eight most common HIV-1 group M subtypes. We then used least-squares regression to ascertain the relative contribution of each of these 46 mutations. The median number of mutations associated with decreased susceptibility to each PI was 28 (range, 19 to 32), and the median number of mutations associated with increased susceptibility to each PI was 2.5 (range, 1 to 8). Of the mutations with the greatest effect on PI susceptibility, I84AV was associated with decreased susceptibility to eight PIs; V32I, G48V, I54ALMSTV, V82F, and L90M were associated with decreased susceptibility to six to seven PIs; I47A, G48M, I50V, L76V, V82ST, and N88S were associated with decreased susceptibility to four to five PIs; and D30N, I50L, and V82AL were associated with decreased susceptibility to fewer than four PIs. This study underscores the greater impact of nonpolymorphic mutations compared with polymorphic mutations on decreased PI susceptibility and provides a comprehensive quantitative assessment of the effects of individual mutations on susceptibility to the eight clinically available PIs.
View details for DOI 10.1128/AAC.00574-10
View details for Web of Science ID 000281907200028
View details for PubMedID 20660676
View details for PubMedCentralID PMC2944562
-
Predicting Tipranavir and Darunavir Resistance Using Genotypic, Phenotypic, and Virtual Phenotypic Resistance Patterns: an Independent Cohort Analysis of Clinical Isolates Highly Resistant to All Other Protease Inhibitors
ANTIMICROBIAL AGENTS AND CHEMOTHERAPY
2010; 54 (6): 2473-2479
Abstract
Genotypic interpretation systems (GISs) for darunavir and tipranavir susceptibility are rarely tested by the use of independent data sets. The virtual phenotype (the phenotype determined by Virco [the "Vircotype"]) was used to interpret all genotypes in Québec, Canada, and phenotypes were determined for isolates predicted to be resistant to all protease inhibitors other than darunavir and tipranavir. We used multivariate analyses to predict relative phenotypic susceptibility to darunavir and tipranavir. We compared the performance characteristics of the Agence Nationale de Recherche sur le Sida scoring algorithm, the Stanford HIV database scoring algorithm (with separate analyses of the discrete and numerical scores), the Vircotype, and the darunavir and tipranavir manufacturers' scores for prediction of the phenotype. Of the 100 isolates whose phenotypes were determined, 89 and 72 were susceptible to darunavir and tipranavir, respectively. In multivariate analyses, the presence of I84V and V82T and the lack of L10F predicted that the isolates would be more susceptible to darunavir than tipranavir. The presence of I54L, V32I, and I47V predicted that the isolates would be more susceptible to tipranavir. All GISs except the system that provided the Stanford HIV database discrete score performed well in predicting the darunavir resistance phenotype (R(2) = 0.61 to 0.69); the R(2) value for the Stanford HIV database discrete scoring system was 0.38. Other than the system that provided the Vircotype (R(2) = 0.80), all GISs performed poorly in predicting the tipranavir resistance phenotype (R(2) = 0.00 to 0.31). In this independent cohort harboring highly protease inhibitor-resistant HIV isolates, reduced phenotypic susceptibility to darunavir and tipranavir was rare. Generally, GISs predict susceptibility to darunavir substantially better than they predict susceptibility to tipranavir.
View details for DOI 10.1128/AAC.00096-10
View details for Web of Science ID 000277756000025
View details for PubMedID 20368406
View details for PubMedCentralID PMC2876425
-
Group Comparison of Eigenvalues and Eigenvectors of Diffusion Tensors
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2010; 105 (490): 588-599
View details for DOI 10.1198/jasa.2010.ap07291
View details for Web of Science ID 000280216700012
-
EXCURSION SETS OF THREE CLASSES OF STABLE RANDOM FIELDS
ADVANCES IN APPLIED PROBABILITY
2010; 42 (2): 293-318
View details for Web of Science ID 000278796800001
-
International Cohort Analysis of the Antiviral Activities of Zidovudine and Tenofovir in the Presence of the K65R Mutation in Reverse Transcriptase
ANTIMICROBIAL AGENTS AND CHEMOTHERAPY
2010; 54 (4): 1520-1525
Abstract
A K65R mutation in HIV-1 reverse transcriptase can occur with the failure of tenofovir-, didanosine-, abacavir-, and, in some cases, stavudine-containing regimens and leads to reduced phenotypic susceptibility to these drugs and hypersusceptibility to zidovudine, but its clinical impact is poorly described. We identified isolates with the K65R mutation within the Stanford Resistance Database and a French cohort for which subsequent treatment and virological response data were available. The partial genotypic susceptibility score (pGSS) was defined as the genotypic susceptibility score (GSS) excluding the salvage regimen's nucleoside reverse transcriptase inhibitor (NRTI) component. A three-part virologic response variable was defined (e.g., complete virologic response, partial virologic response, and no virologic response). Univariate, multivariate, and bootstrap analyses evaluated factors associated with the virologic response, focusing on the contributions of zidovudine and tenofovir. Seventy-one of 130 patients (55%) achieved a complete virologic response (defined as an HIV RNA level of <200 copies/ml). In univariate analyses, pGSS and zidovudine use in the salvage regimen were predictors of the virologic response. In a multivariate analysis, pGSS and zidovudine and tenofovir use were associated with the virologic response. Bootstrap analyses showed similar reductions in HIV RNA levels with zidovudine or tenofovir use (0.5 to 0.9 log(10)). In the presence of K65R, zidovudine and tenofovir are associated with similar reductions in HIV RNA levels. Given its tolerability, tenofovir may be the preferred agent over zidovudine even in the presence of the K65R mutation.
View details for DOI 10.1128/AAC.01380-09
View details for Web of Science ID 000275662700017
View details for PubMedID 20124005
View details for PubMedCentralID PMC2849386
-
Group Comparison of Eigenvalues and Eigenvectors of Diffusion Tensors.
Journal of the American Statistical Association
2010; 105 (490): 588-599
Abstract
Diffusion tensor imaging (DTI) data differ from most medical images in that values at each voxel are not scalars, but 3 × 3 symmetric positive definite matrices called diffusion tensors (DTs). The anatomic characteristics of the tissue at each voxel are reflected by the DT eigenvalues and eigenvectors. In this article we consider the problem of testing whether the means of two groups of DT images are equal at each voxel in terms of the DT's eigenvalues, eigenvectors, or both. Because eigendecompositions are highly nonlinear, existing likelihood ratio statistics (LRTs) for testing differences in eigenvalues or eigenvectors of means of Gaussian symmetric matrices assume an orthogonally invariant covariance structure between the matrix entries. While retaining the form of the LRTs, we derive new approximations to their true distributions when the covariance between the DT entries is arbitrary and possibly different between the two groups. The approximate distributions are those of similar LRT statistics computed on the tangent space to the parameter manifold at the true value of the parameter, but plugging in an estimate for the point of application of the tangent space. The resulting distributions, which are weighted sums of chi-squared distributions, are further approximated by scaled chi-squared distributions by matching the first two moments. For validity of the Gaussian model, the positive definite constraints on the DT are removed via a matrix log transformation, although this is not crucial asymptotically. Voxelwise application of the test statistics leads to a multiple-testing problem, which is solved by false discovery rate inference. The foregoing methods are illustrated in a DTI group comparison of boys versus girls.
View details for DOI 10.1198/jasa.2010.ap07291
View details for PubMedID 35386273
View details for PubMedCentralID PMC8982984
-
Predictive Value of HIV-1 Genotypic Resistance Test Interpretation Algorithms
JOURNAL OF INFECTIOUS DISEASES
2009; 200 (3): 453-463
Abstract
Interpreting human immunodeficiency virus type 1 (HIV-1) genotypic drug-resistance test results is challenging for clinicians treating HIV-1-infected patients. Multiple drug-resistance interpretation algorithms have been developed, but their predictive value has rarely been evaluated using contemporary clinical data sets.We examined the predictive value of 4 algorithms at predicting virologic response (VR) during 734 treatment-change episodes (TCEs). VR was defined as attaining plasma HIV-1 RNA levels below the limit of quantification. Drug-specific genotypic susceptibility scores (GSSs) were calculated by applying each algorithm to the baseline genotype. Weighted GSSs were calculated by multiplying drug-specific GSSs by antiretroviral (ARV) potency factors. Regimen-specific GSSs (rGSSs) were calculated by adding unweighted or weighted drug-specific GSSs for each salvage therapy ARV. The predictive value of rGSSs were estimated by use of multivariate logistic regression.Of 734 TCEs, 475 (65%) were associated with VR. The rGSSs for the 4 algorithms were the variables most strongly predictive of VR. The adjusted rGSS odds ratios ranged from 1.6 to 2.2 (P < .001). Using 10-fold cross-validation, the averaged area under the receiver operating characteristic curve for all algorithms increased from 0.76 with unweighted rGSSs to 0.80 with weighted rGSSs.Unweighted and weighted rGSSs of 4 genotypic resistance algorithms were the strongest independent predictors of VR. Optimizing ARV weighting may further improve VR predictions.
View details for DOI 10.1086/600073
View details for Web of Science ID 000267604000018
View details for PubMedID 19552527
-
A Tribute to: Keith Worsley-1951-2009 In Memoriam
NEUROIMAGE
2009; 46 (4): 891-894
View details for DOI 10.1016/j.neuroimage.2009.04.026
View details for Web of Science ID 000266975600001
View details for PubMedID 19374950
-
GAUSSIAN PROCESSES, KINEMATIC FORMULAE AND POINCARE'S LIMIT
ANNALS OF PROBABILITY
2009; 37 (4): 1459-1482
View details for DOI 10.1214/08-AOP439
View details for Web of Science ID 000268692700007
-
Special Issue on Mathematics in Brain Imaging
NEUROIMAGE
2009; 45 (1): S1-S2
View details for DOI 10.1016/j.neuroimage.2008.10.033
View details for Web of Science ID 000263862600001
View details for PubMedID 19027863
-
Maintaining Reduced Viral Fitness and CD4 Response in HIV-Infected Patients with Viremia Receiving a Boosted Protease Inhibitor
CLINICAL INFECTIOUS DISEASES
2009; 48 (5): 680-682
Abstract
When fully suppressive regimens are not available, incompletely suppressive regimens also provide immunologic benefits. In this study, with stable background therapy, human immunodeficiency virus (HIV)-infected patients who were randomized to receive atazanavir or boosted atazanavir, compared with those who continued boosted protease inhibitor therapy, maintained similar virologic and immunologic control, resistance-mutation patterns, and replication capacities with reduced use of lipid-lowering medication.
View details for DOI 10.1086/597008
View details for Web of Science ID 000263061700026
View details for PubMedID 19191657
-
Empirical null and false discovery rate analysis in neuroimaging
NEUROIMAGE
2009; 44 (1): 71-82
Abstract
Current strategies for thresholding statistical parametric maps in neuroimaging include control of the family-wise error rate, control of the false discovery rate (FDR) and thresholding of the posterior probability of a voxel being active given the data, the latter derived from a mixture model of active and inactive voxels. Correct inference using any of these criteria depends crucially on the specification of the null distribution of the test statistics. In this article we show examples from fMRI and DTI data where the theoretical null distribution does not match well the observed distribution of the test statistics. As a solution, we introduce the use of an empirical null, a null distribution empirically estimated from the data itself, allowing for global corrections of theoretical null assumptions. The theoretical null distributions considered are normal, t, chi(2) and F, all commonly encountered in neuroimaging. The empirical null estimate is accompanied by an estimate of the proportion of non-active voxels in the data. Based on the two-class mixture model, we present the equivalence between the strategies of controlling FDR and thresholding posterior probabilities in the context of neuroimaging and show that the FDR estimates derived from the empirical null can be seen as empirical Bayes estimates.
View details for DOI 10.1016/j.neuroimage.2008.04.182
View details for Web of Science ID 000262300900009
View details for PubMedID 18547821
-
INFERENCE FOR EIGENVALUES AND EIGENVECTORS OF GAUSSIAN SYMMETRIC MATRICES
ANNALS OF STATISTICS
2008; 36 (6): 2886-2919
View details for DOI 10.1214/08-AOS628
View details for Web of Science ID 000262731400011
-
TILTED EULER CHARACTERISTIC DENSITIES FOR CENTRAL LIMIT RANDOM FIELDS, WITH APPLICATION TO "BUBBLES"
ANNALS OF STATISTICS
2008; 36 (5): 2471-2507
View details for DOI 10.1214/07-AOS549
View details for Web of Science ID 000260554100018
-
FALSE DISCOVERY RATE ANALYSIS OF BRAIN DIFFUSION DIRECTION MAPS
ANNALS OF APPLIED STATISTICS
2008; 2 (1): 153-175
View details for DOI 10.1214/07-AOAS133
View details for Web of Science ID 000261057700013
-
FALSE DISCOVERY RATE ANALYSIS OF BRAIN DIFFUSION DIRECTION MAPS.
The annals of applied statistics
2008; 2 (1): 153-175
Abstract
Diffusion tensor imaging (DTI) is a novel modality of magnetic resonance imaging that allows noninvasive mapping of the brain's white matter. A particular map derived from DTI measurements is a map of water principal diffusion directions, which are proxies for neural fiber directions. We consider a study in which diffusion direction maps were acquired for two groups of subjects. The objective of the analysis is to find regions of the brain in which the corresponding diffusion directions differ between the groups. This is attained by first computing a test statistic for the difference in direction at every brain location using a Watson model for directional data. Interesting locations are subsequently selected with control of the false discovery rate. More accurate modeling of the null distribution is obtained using an empirical null density based on the empirical distribution of the test statistics across the brain. Further, substantial improvements in power are achieved by local spatial averaging of the test statistic map. Although the focus is on one particular study and imaging technology, the proposed inference methods can be applied to other large scale simultaneous hypothesis testing problems with a continuous underlying spatial structure.
View details for DOI 10.1214/07-AOAS133
View details for PubMedID 35388313
View details for PubMedCentralID PMC8982959
-
Random fields of multivariate test statistics, with applications to shape analysis
ANNALS OF STATISTICS
2008; 36 (1): 1-27
View details for DOI 10.1214/009053607000000406
View details for Web of Science ID 000253390000001
-
Detecting sparse signals in random fields, with an application to brain mapping
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2007; 102 (479): 913-928
View details for DOI 10.1198/016214507000000815
View details for Web of Science ID 000249752300020
-
Maxima of discretely sampled random fields, with an application to 'bubbles'
BIOMETRIKA
2007; 94 (1): 1-18
View details for DOI 10.1093/biomet/asm004
View details for Web of Science ID 000244839800001
-
Forward stagewise regression and the monotone lasso
ELECTRONIC JOURNAL OF STATISTICS
2007; 1: 1-29
View details for DOI 10.1214/07-EJS004
View details for Web of Science ID 000207854200001
-
Genotypic predictors of human immunodeficiency virus type 1 drug resistance
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2006; 103 (46): 17355-17360
Abstract
Understanding the genetic basis of HIV-1 drug resistance is essential to developing new antiretroviral drugs and optimizing the use of existing drugs. This understanding, however, is hampered by the large numbers of mutation patterns associated with cross-resistance within each antiretroviral drug class. We used five statistical learning methods (decision trees, neural networks, support vector regression, least-squares regression, and least angle regression) to relate HIV-1 protease and reverse transcriptase mutations to in vitro susceptibility to 16 antiretroviral drugs. Learning methods were trained and tested on a public data set of genotype-phenotype correlations by 5-fold cross-validation. For each learning method, four mutation sets were used as input features: a complete set of all mutations in > or =2 sequences in the data set, the 30 most common data set mutations, an expert panel mutation set, and a set of nonpolymorphic treatment-selected mutations from a public database linking protease and reverse transcriptase sequences to antiretroviral drug exposure. The nonpolymorphic treatment-selected mutations led to the best predictions: 80.1% accuracy at classifying sequences as susceptible, low/intermediate resistant, or highly resistant. Least angle regression predicted susceptibility significantly better than other methods when using the complete set of mutations. The three regression methods provided consistent estimates of the quantitative effect of mutations on drug susceptibility, identifying nearly all previously reported genotype-phenotype associations and providing strong statistical support for many new associations. Mutation regression coefficients showed that, within a drug class, cross-resistance patterns differ for different mutation subsets and that cross-resistance has been underestimated.
View details for DOI 10.1073/pnas.0607274103
View details for Web of Science ID 000242249400053
View details for PubMedID 17065321
View details for PubMedCentralID PMC1622926
-
Inference for magnitudes and delays of responses in the FIAC data using BRAINSTAT/FMRISTAT
Joint Statistical Meeting of the American-Statistical-Association
WILEY-LISS. 2006: 434–41
Abstract
We used straightforward linear mixed effects models as described in Worsley et al. together with recent advances in smoothing to control the degrees of freedom, and random field theory based on discrete local maxima. This has been implemented in BRAINSTAT, a Python version of FMRISTAT. Our main novelty is voxel-wise inference for both magnitude and delay (latency) of the hemodynamic response. Our analysis appears to be more sensitive than that of Dehaene-Lambertz et al. Our main findings are greater magnitude (1.08% +/- 0.17%) and delay (0.153 +/- 0.035 s) for different sentences compared to same sentences, together with a smaller but still significantly greater magnitude for different speaker compared to same speaker (0.47% +/- 0.08%).
View details for DOI 10.1002/hbm.20248
View details for Web of Science ID 000237145800010
View details for PubMedID 16568426
-
A tail strength measure for assessing the overall univariate significance in a dataset
BIOSTATISTICS
2006; 7 (2): 167-181
Abstract
We propose an overall measure of significance for a set of hypothesis tests. The 'tail strength' is a simple function of the p-values computed for each of the tests. This measure is useful, for example, in assessing the overall univariate strength of a large set of features in microarray and other genomic and biomedical studies. It also has a simple relationship to the false discovery rate of the collection of tests. We derive the asymptotic distribution of the tail strength measure, and illustrate its use on a number of real datasets.
View details for DOI 10.1093/biostatistics/kxj009
View details for Web of Science ID 000236436300001
View details for PubMedID 16332926
-
Detecting fMRI activation allowing for unknown latency of the hemodynamic response
NEUROIMAGE
2006; 29 (2): 649-654
Abstract
Several authors have suggested allowing for unknown latency of the hemodynamic response by incorporation of hemodynamic derivative terms into the linear model for the statistical analysis of fMRI data. In this paper, we show how to use random field theory to provide a P value for local maxima of two test statistics that have been recently proposed for detecting activation based on this analysis.
View details for DOI 10.1016/j.neuroimage.2005.07.032
View details for Web of Science ID 000234841200031
View details for PubMedID 16125978
-
Virological response to antiretroviral therapy in the setting of the K65R mutation
15th International HIV Drug Resistance Workshop
INT MEDICAL PRESS LTD. 2006: S92–S92
View details for Web of Science ID 000239984700102
-
A Gaussian kinematic formula
ANNALS OF PROBABILITY
2006; 34 (1): 122-158
View details for Web of Science ID 000235925000004
-
HIV-1 protease and reverse-transcriptase mutations: Correlations with antiretroviral therapy in subtype B isolates and implications for drug-resistance surveillance
13th International AIDS Conference
UNIV CHICAGO PRESS. 2005: 456–65
Abstract
Background. It is important, for drug-resistance surveillance, to identify human immunodeficiency virus type 1 (HIV-1) strains that have undergone antiretroviral drug selection.Methods. We compared the prevalence of protease and reverse-transcriptase (RT) mutations in HIV-1 sequences from persons with and without previous treatment with protease inhibitors (PIs), nucleoside RT inhibitors (NRTIs), and nonnucleoside RT inhibitors (NNRTIs). Treatment-associated mutations in protease isolates from 5867 persons and RT isolates from 6247 persons were categorized by whether they were polymorphic (prevalence, >0.5%) in untreated individuals and whether they were established drug-resistance mutations. New methods were introduced to minimize misclassification from transmitted resistance, population stratification, sequencing artifacts, and multiple hypothesis testing.Results. Some 36 established and 24 additional nonpolymorphic protease mutations at 34 positions were related to PI treatment, 21 established and 22 additional nonpolymorphic RT mutations at 24 positions with NRTI treatment, and 15 established and 11 additional nonpolymorphic RT mutations at 15 positions with NNRTI treatment. In addition, 11 PI-associated and 1 NRTI-associated established mutations were polymorphic in viruses from untreated persons.Conclusions. Established drug-resistance mutations encompass only a subset of treatment-associated mutations; some of these are polymorphic in untreated persons. In contrast, nonpolymorphic treatment-associated mutations may be more sensitive and specific markers of transmitted HIV-1 drug resistance.
View details for Web of Science ID 000230387500013
View details for PubMedID 15995959
-
Validity of the expected Euler characteristic heuristic
ANNALS OF PROBABILITY
2005; 33 (4): 1362-1396
View details for DOI 10.1214/009117905000000099
View details for Web of Science ID 000230469400005
-
Cross-subject comparison of principal diffusion direction maps
MAGNETIC RESONANCE IN MEDICINE
2005; 53 (6): 1423-1431
Abstract
Diffusion tensor imaging (DTI) data differ fundamentally from most brain imaging data in that values at each voxel are not scalars but 3 x 3 positive definite matrices also called diffusion tensors. Frequently, investigators simplify the data analysis by reducing the tensor to a scalar, such as fractional anisotropy (FA). New statistical methods are needed for analyzing vector and tensor valued imaging data. A statistical model is proposed for the principal eigenvector of the diffusion tensor based on the bipolar Watson distribution. Methods are presented for computing mean direction and dispersion of a sample of directions and for testing whether two samples of directions (e.g., same voxel across two groups of subjects) have the same mean. False discovery rate theory is used to identify voxels for which the two-sample test is significant. These methods are illustrated in a DTI data set collected to study reading ability. It is shown that comparison of directions reveals differences in gross anatomic structure that are invisible to FA.
View details for DOI 10.1002/mrm.20503
View details for Web of Science ID 000229468200024
View details for PubMedID 15906307
-
Distributed neural representation of expected value
JOURNAL OF NEUROSCIENCE
2005; 25 (19): 4806-4812
Abstract
Anticipated reward magnitude and probability comprise dual components of expected value (EV), a cornerstone of economic and psychological theory. However, the neural mechanisms that compute EV have not been characterized. Using event-related functional magnetic resonance imaging, we examined neural activation as subjects anticipated monetary gains and losses that varied in magnitude and probability. Group analyses indicated that, although the subcortical nucleus accumbens (NAcc) activated proportional to anticipated gain magnitude, the cortical mesial prefrontal cortex (MPFC) additionally activated according to anticipated gain probability. Individual difference analyses indicated that, although NAcc activation correlated with self-reported positive arousal, MPFC activation correlated with probability estimates. These findings suggest that mesolimbic brain regions support the computation of EV in an ascending and distributed manner: whereas subcortical regions represent an affective component, cortical regions also represent a probabilistic component, and, furthermore, may integrate the two.
View details for DOI 10.1523/JNEUROSCI.0642-05.2005
View details for Web of Science ID 000229038300014
View details for PubMedID 15888656
-
Comparison of the precision and sensitivity of the antivirogram and PhenoSense HIV drug susceptibility assays
JAIDS-JOURNAL OF ACQUIRED IMMUNE DEFICIENCY SYNDROMES
2005; 38 (4): 439-444
Abstract
Although 2 widely used susceptibility assays have been developed, their precision and sensitivity have not been assessed.To assess the precision of the Antivirogram and PhenoSense assays, we examined susceptibility results of HIV-1 isolates lacking drug resistance mutations and containing matching patterns of drug resistance mutations. To assess sensitivity, we determined for each assay the proportion of isolates with common patterns of matching drug resistance mutations having reductions in susceptibility greater than those in isolates without drug resistance mutations.We analyzed protease inhibitor (PI) susceptibility results obtained by the Antivirogram assay for 293 isolates and by the PhenoSense assay for 300 isolates. We analyzed reverse transcriptase (RT) inhibitor susceptibility results obtained by the Antivirogram assay for 202 isolates and by the PhenoSense assay for 126 isolates. For wild-type and mutant isolates, the median absolute deviance of the fold resistance of nucleoside RT inhibitor susceptibility results was significantly lower for the PhenoSense assay than for the Antivirogram assay. The PhenoSense assay was also significantly more likely than the Antivirogram assay to detect resistance to abacavir, didanosine, and stavudine in isolates with the common drug resistance mutations M41L, M184V, and T215Y (+/-L210W). We found no significant differences between the 2 assays for detecting PI and nonnucleoside RT inhibitor resistance.The PhenoSense assay is more precise than the Antivirogram assay and superior at detecting resistance to abacavir, didanosine, and stavudine.
View details for Web of Science ID 000227665000009
View details for PubMedID 15764961
-
The 'miss rate' for the analysis of gene expression data
BIOSTATISTICS
2005; 6 (1): 111-117
Abstract
Multiple testing issues are important in gene expression studies, where typically thousands of genes are compared over two or more experimental conditions. The false discovery rate has become a popular measure in this setting. Here we discuss a complementary measure, the 'miss rate', and show how to estimate it in practice.
View details for DOI 10.1093/biostatistics/kxh021
View details for Web of Science ID 000226346300009
View details for PubMedID 15618531
-
Evolution of resistance to drugs in HIV-1-infected patients failing antiretroviral therapy
AIDS
2004; 18 (11): 1503-1511
Abstract
The optimal time for changing failing antiretroviral therapy (ART) is not known. It involves balancing the risk of exhausting future treatment options against the risk of developing increased drug resistance. The frequency with which new drug-resistance mutations (DRM) developed and their potential consequences in patients continuing unchanged treatment despite persistent viremia were assessed.A retrospective study of consecutive sequence samples from 106 patients at one institution with viral load (VL) of more than 400 copies/ml, with no change in ART for more than 2 months despite virologic failure.Two consecutive pol sequences, CD4 cell counts and VL were analyzed to quantify the development of new DRM and to identify changes in immunologic and virologic parameters. Genotypic susceptibility scores (GSS) and viral drug susceptibilities were calculated by a computer program (HIVDB). Poisson log-linear regression models were used to predict the expected number of mutations at the second time point.: After a median of 14 months of continued ART, 75% (80 of 106) of patients acquired new DRM and were assigned a significantly lower GSS, potentially limiting the success of future ART. The development of new DRM was proportional to the time between the two sequences and inversely proportional to the number of DRM in the first sequence. However, the development of DRM was not associated with significant changes in CD4 or VL counts.Despite stable levels of CD4 and VL over time, maintaining a failing therapeutic regimen increases drug resistance and may limit future treatment options.
View details for DOI 10.1097/01.aids.0000131358.29586.6b
View details for Web of Science ID 000222934500004
View details for PubMedID 15238768
View details for PubMedCentralID PMC2547474
-
Unified univariate and multivariate random field theory
Conference on Mathermatics in Brain Imaging
ACADEMIC PRESS INC ELSEVIER SCIENCE. 2004: S189–S195
Abstract
We report new random field theory P values for peaks of canonical correlation SPMs for detecting multiple contrasts in a linear model for multivariate image data. This completes results for all types of univariate and multivariate image data analysis. All other known univariate and multivariate random field theory results are now special cases, so these new results present a true unification of all currently known results. As an illustration, we use these results in a deformation-based morphometry (DBM) analysis to look for regions of the brain where vector deformations of nonmissile trauma patients are related to several verbal memory scores, to detect regions of changes in anatomical effective connectivity between the trauma patients and a group of age- and sex-matched controls, and to look for anatomical connectivity in cortical thickness.
View details for DOI 10.1016/j.neuroimage.2004.07.026
View details for Web of Science ID 000225374100018
View details for PubMedID 15501088
-
Lack of detectable human immunodeficiency virus type 1 superinfection during 1072 person-years of observation
11th International Workshop on HIV Drug Resistance and Treatment Strategies
UNIV CHICAGO PRESS. 2003: 397–405
Abstract
We examined consecutive protease (PR) and reverse transcriptase (RT) sequences from human immunodeficiency virus (HIV) type 1-infected individuals, to distinguish changes resulting from sequence evolution due to possible superinfection. Between July 1997 and December 2001, >/=2 PR and RT samples from 718 persons were sequenced at Stanford University Hospital. Thirty-seven persons had highly divergent sequence pairs characterized by a nucleotide distance of >4.5% in PR or >3.0% in RT. In 16 of 37 sequence pairs, divergence resulted from the loss of mutations during a treatment interruption or from the gain of mutations with reinstitution of treatment. tat and/or gag sequencing of HIV-1 from cryopreserved plasma samples could be performed on 15 of the 21 divergent isolate pairs from persons without a treatment interruption. The sequences of these genes, unaffected by selective drug pressure, were monophyletic. Although HIV-1 PR and RT genes from treated persons may become highly divergent, these changes usually are the result of sequence evolution, rather than superinfection.
View details for Web of Science ID 000184316900008
View details for PubMedID 12870121
-
Extended spectrum of HIV-1 reverse transcriptase mutations in patients receiving multiple nucleoside analog inhibitors
AIDS
2003; 17 (6): 791-799
Abstract
To characterize reverse transcriptase (RT) mutations by their association with extent of nucleoside RT inhibitor (NRTI) therapy. To identify mutational clusters in RT sequences from persons receiving multiple NRTI.A total of 1210 RT sequences from persons with known antiretroviral therapy were analyzed: 641 new sequences were performed at Stanford University Hospital; 569 were previously published.Chi-square tests and logistic regression were done to identify associations between mutations and NRTI therapy. Correlation studies were done to identify mutational clusters. The Benjamini-Hochberg procedure was used to correct for multiple comparisons.Mutations at 26 positions were significantly associated with NRTI including 17 known resistance mutations (positions 41, 44, 62, 65, 67, 69, 70, 74, 75, 77, 116, 118, 151, 184, 210, 215, 219) and nine previously unreported mutations (positions 20, 39, 43, 203, 208, 218, 221, 223, 228). The nine new mutations correlated linearly with number of NRTI; 777 out of 817 (95%) instances occurred with known drug resistance mutations. Positions 203, 208, 218, 221, 223, and 228 were conserved in untreated persons; positions 20, 39, and 43 were polymorphic. Most NRTI-associated mutations clustered into three groups: (i) 62, 65, 75, 77, 115, 116, 151; (ii) 41, 43, 44, 118, 208, 210, 215, 223; (iii) 67, 69, 70, 218, 219, 228.Mutations at nine previously unreported positions are associated with NRTI therapy. These mutations are probably accessory because they occur almost exclusively with known drug resistance mutations. Most NRTI mutations group into one of three clusters, although several (e.g., M184V) occur in multiple mutational contexts.
View details for DOI 10.1097/01.aids.0000050860.71999.23
View details for Web of Science ID 000182779700005
View details for PubMedID 12660525
View details for PubMedCentralID PMC2573403
-
Mutation patterns and structural correlates in human immunodeficiency virus type 1 protease following different protease inhibitor treatments
JOURNAL OF VIROLOGY
2003; 77 (8): 4836-4847
Abstract
Although many human immunodeficiency virus type 1 (HIV-1)-infected persons are treated with multiple protease inhibitors in combination or in succession, mutation patterns of protease isolates from these persons have not been characterized. We collected and analyzed 2,244 subtype B HIV-1 isolates from 1,919 persons with different protease inhibitor experiences: 1,004 isolates from untreated persons, 637 isolates from persons who received one protease inhibitor, and 603 isolates from persons receiving two or more protease inhibitors. The median number of protease mutations per isolate increased from 4 in untreated persons to 12 in persons who had received four or more protease inhibitors. Mutations at 45 of the 99 amino acid positions in the protease-including 22 not previously associated with drug resistance-were significantly associated with protease inhibitor treatment. Mutations at 17 of the remaining 99 positions were polymorphic but not associated with drug treatment. Pairs and clusters of correlated (covarying) mutations were significantly more likely to occur in treated than in untreated persons: 115 versus 23 pairs and 30 versus 2 clusters, respectively. Of the 115 statistically significant pairs of covarying residues in the treated isolates, 59 were within 8 A of each other-many more than would be expected by chance. In summary, nearly one-half of HIV-1 protease positions are under selective drug pressure, including many residues not previously associated with drug resistance. Structural factors appear to be responsible for the high frequency of covariation among many of the protease residues. The presence of mutational clusters provides insight into the complex mutational patterns required for HIV-1 protease inhibitor resistance.
View details for DOI 10.1128/JVI.77.8.4836-4847.2003
View details for Web of Science ID 000181970200037
View details for PubMedID 12663790
View details for PubMedCentralID PMC152121
-
Euler characteristics for Gaussian fields on manifolds
ANNALS OF PROBABILITY
2003; 31 (2): 533-563
View details for Web of Science ID 000182167600001
-
Deformation-based surface morphometry applied to gray matter deformation
NEUROIMAGE
2003; 18 (2): 198-213
Abstract
We present a unified statistical approach to deformation-based morphometry applied to the cortical surface. The cerebral cortex has the topology of a 2D highly convoluted sheet. As the brain develops over time, the cortical surface area, thickness, curvature, and total gray matter volume change. It is highly likely that such age-related surface changes are not uniform. By measuring how such surface metrics change over time, the regions of the most rapid structural changes can be localized. We avoided using surface flattening, which distorts the inherent geometry of the cortex in our analysis and it is only used in visualization. To increase the signal to noise ratio, diffusion smoothing, which generalizes Gaussian kernel smoothing to an arbitrary curved cortical surface, has been developed and applied to surface data. Afterward, statistical inference on the cortical surface will be performed via random fields theory. As an illustration, we demonstrate how this new surface-based morphometry can be applied in localizing the cortical regions of the gray matter tissue growth and loss in the brain images longitudinally collected in the group of children and adolescents.
View details for Web of Science ID 000181182500002
View details for PubMedID 12595176
-
Spectrum of HIV-1 reverse transcriptase mutations selected by nucleoside reverse transcriptase inhibitor treatment is greater than previously reported
INT MEDICAL PRESS LTD. 2002: S38–S38
View details for Web of Science ID 000177401300034