Art Owen
Max H. Stein Professor
Statistics
Administrative Appointments
-
Chair, Department of Statistics, Stanford University (2018 - Present)
Honors & Awards
-
SIAM Fellow, Society for Industrial and Applied Mathematics (2024)
-
Gold medal, Statistical Society of Canada (2021)
-
Noether Distinguished Scholar, American Statistical Association (2020)
Current Research and Scholarly Interests
Statistical methods to analyze large data matrices in bioinformatics
2024-25 Courses
- Applied Multivariate Analysis
STATS 206 (Spr) - Design of Experiments
STATS 263, STATS 363 (Win) - Literature of Statistics
STATS 319 (Spr) -
Independent Studies (6)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum) - Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum) - Industrial Research for Statisticians
STATS 398 (Aut, Win, Spr, Sum) - Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum) - Ph.D. Research
CME 400 (Aut, Win, Spr, Sum) - Research
STATS 399 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
-
Prior Year Courses
2023-24 Courses
- Applied Multivariate Analysis
BIODS 206, STATS 206 (Aut) - Applied Statistics I
STATS 305A (Aut) - Topic: Monte Carlo
STATS 362 (Win)
2022-23 Courses
- Design of Experiments
STATS 263, STATS 363 (Aut) - Empirical Likelihood
STATS 365 (Spr) - Sampling
STATS 204 (Spr)
2021-22 Courses
- Design of Experiments
STATS 263, STATS 363 (Aut)
- Applied Multivariate Analysis
Stanford Advisees
-
Doctoral Dissertation Reader (AC)
Disha Ghandwani, Sophia Lu, Anav Sood -
Doctoral Dissertation Advisor (AC)
Harrison Li, Tim Morrison
Graduate and Fellowship Programs
-
Biomedical Data Science (Phd Program)
All Publications
-
SUPER-POLYNOMIAL ACCURACY OF MULTIDIMENSIONAL RANDOMIZED NETS USING THE MEDIAN-OF-MEANS
MATHEMATICS OF COMPUTATION
2024
View details for DOI 10.1090/mcom/3880
View details for Web of Science ID 001208341500001
-
Estimating means of bounded random variables by betting
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
2024; 86 (1)
View details for DOI 10.1093/jrsssb/qkad116
View details for Web of Science ID 001163100700017
-
A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model
BERNOULLI
2024; 30 (1): 743-769
View details for DOI 10.3150/23-BEJ1615
View details for Web of Science ID 001171838900025
-
A GENERAL CHARACTERIZATION OF OPTIMAL TIE-BREAKER DESIGNS
ANNALS OF STATISTICS
2023; 51 (3): 1030-1057
View details for DOI 10.1214/23-AOS2275
View details for Web of Science ID 001055382500004
-
The nonzero gain coefficients of Sobol?s sequences are always powers of two
JOURNAL OF COMPLEXITY
2023; 75
View details for DOI 10.1016/j.jco.2022.101700
View details for Web of Science ID 000925260500001
-
Combining observational and experimental datasets using shrinkage estimators.
Biometrics
2023
Abstract
We consider the problem of combining data from observational and experimental sources to draw causal conclusions. To derive combined estimators with desirable properties, we extend results from the Stein shrinkage literature. Our contributions are threefold. First, we propose a generic procedure for deriving shrinkage estimators in this setting, making use of a generalized unbiased risk estimate. Second, we develop two new estimators, prove finite sample conditions under which they have lower risk than an estimator using only experimental data, and show that each achieves a notion of asymptotic optimality. Third, we draw connections between our approach and results in sensitivity analysis, including proposing a method for evaluating the feasibility of ourestimators. This article is protected by copyright. All rights reserved.
View details for DOI 10.1111/biom.13827
View details for PubMedID 36629736
-
Deletion and Insertion Tests in Regression Models
JOURNAL OF MACHINE LEARNING RESEARCH
2023; 24
View details for Web of Science ID 001111563200001
-
Kernel regression analysis of tie-breaker designs
ELECTRONIC JOURNAL OF STATISTICS
2023; 17 (1): 243-290
View details for DOI 10.1214/23-EJS2102
View details for Web of Science ID 000951095100005
-
PREINTEGRATION VIA ACTIVE SUBSPACE
SIAM JOURNAL ON NUMERICAL ANALYSIS
2023; 61 (2): 495-514
View details for DOI 10.1137/22M1479129
View details for Web of Science ID 000954803300004
-
SUPER-POLYNOMIAL ACCURACY OF ONE DIMENSIONAL RANDOMIZED NETS USING THE MEDIAN OF MEANS
MATHEMATICS OF COMPUTATION
2022
View details for DOI 10.1090/mcom/3791
View details for Web of Science ID 000872951500001
-
DETECTING MULTIPLE REPLICATING SIGNALS USING ADAPTIVE FILTERING PROCEDURES
ANNALS OF STATISTICS
2022; 50 (4): 1890-1909
View details for DOI 10.1214/21-AOS2139
View details for Web of Science ID 000847855400002
-
Combining randomized field experiments with observational satellite data to assess the benefits of crop rotations on yields
ENVIRONMENTAL RESEARCH LETTERS
2022; 17 (4)
View details for DOI 10.1088/1748-9326/ac6083
View details for Web of Science ID 000778724700001
-
BACKFITTING FOR LARGE SCALE CROSSED RANDOM EFFECTS REGRESSIONS
ANNALS OF STATISTICS
2022; 50 (1): 560-583
View details for DOI 10.1214/21-AOS2121
View details for Web of Science ID 000758697800023
-
On Dropping the First Sobol' Point
SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 71-86
View details for DOI 10.1007/978-3-030-98319-2_4
View details for Web of Science ID 000871749800004
-
Scalable logistic regression with crossed random effects
ELECTRONIC JOURNAL OF STATISTICS
2022; 16 (2): 4604-4635
View details for DOI 10.1214/22-EJS2047
View details for Web of Science ID 000953164900017
-
Propensity score methods for merging observational and experimental datasets.
Statistics in medicine
2021
Abstract
We consider how to merge a limited amount of data from a randomized controlled trial (RCT) into a much larger set of data from an observational data base (ODB), to estimate an average causal treatment effect. Our methods are based on stratification. The strata are defined in terms of effect moderators as well as propensity scores estimated in the ODB. Data from the RCT are placed into the strata they would have occupied, had they been in the ODB instead. We assume that treatment differences are comparable in the two data sources. Our first "spiked-in" method simply inserts the RCT data into their corresponding ODB strata. We also consider a data-driven convex combination of the ODB and RCT treatment effect estimates within each stratum. Using the delta method and simulations, we identify a bias problem with the spiked-in estimator that is ameliorated by the convex combination estimator. We apply our methods to data from the Women's Health Initiative, a study of thousands of postmenopausal women which has both observational and experimental data on hormone therapy (HT). Using half of the RCT to define a gold standard, we find that a version of the spiked-in estimator yields lower-MSE estimates of the causal impact of HT on coronary heart disease than would be achieved using either a small RCT or the observational component on its own.
View details for DOI 10.1002/sim.9223
View details for PubMedID 34671998
-
A Strong Law of Large Numbers for Scrambled Net Integration
SIAM REVIEW
2021; 63 (2): 360-372
View details for DOI 10.1137/20M1320535
View details for Web of Science ID 000674286700003
-
Quasi-Monte Carlo Quasi-Newton in Variational Bayes
JOURNAL OF MACHINE LEARNING RESEARCH
2021; 22
View details for Web of Science ID 000706449200001
-
Efficient Estimation of the ANOVA Mean Dimension, with an Application to Neural Net Classification
SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION
2021; 9 (2): 708-730
View details for DOI 10.1137/20M1350236
View details for Web of Science ID 000674285900013
-
Designing experiments informed by observational studies
JOURNAL OF CAUSAL INFERENCE
2021; 9 (1): 147-171
View details for DOI 10.1515/jci-2021-0010
View details for Web of Science ID 000677584800006
-
Density Estimation by Randomized Quasi-Monte Carlo
SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION
2021; 9 (1): 280–301
View details for DOI 10.1137/19M1259213
View details for Web of Science ID 000643273300010
-
ESTIMATION AND INFERENCE FOR VERY LARGE LINEAR MIXED EFFECTS MODELS
STATISTICA SINICA
2020; 30 (4): 1741–71
View details for DOI 10.5705/ss.202018.0029
View details for Web of Science ID 000576056500004
-
The Square Root Rule for Adaptive Importance Sampling
ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION
2020; 30 (2)
View details for DOI 10.1145/3350426
View details for Web of Science ID 000583706000006
-
MEAN DIMENSION OF RIDGE FUNCTIONS
SIAM JOURNAL ON NUMERICAL ANALYSIS
2020; 58 (2): 1195–1216
View details for DOI 10.1137/19M127149X
View details for Web of Science ID 000546990100012
-
Optimizing the tie-breaker regression discontinuity design
ELECTRONIC JOURNAL OF STATISTICS
2020; 14 (2): 4004–27
View details for DOI 10.1214/20-EJS1765
View details for Web of Science ID 000587719400042
-
PERMUTATION p-VALUE APPROXIMATION VIA GENERALIZED STOLARSKY INVARIANCE
ANNALS OF STATISTICS
2019; 47 (1): 583–611
View details for DOI 10.1214/18-AOS1702
View details for Web of Science ID 000451778700020
-
Comment: Unreasonable Effectiveness of Monte Carlo
STATISTICAL SCIENCE
2019; 34 (1): 29–33
View details for DOI 10.1214/18-STS676
View details for Web of Science ID 000464350600003
-
Admissibility in Partial Conjunction Testing
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2019; 114 (525): 158–68
View details for DOI 10.1080/01621459.2017.1385465
View details for Web of Science ID 000471325500017
-
Importance sampling the union of rare events with an application to power systems analysis
ELECTRONIC JOURNAL OF STATISTICS
2019; 13 (1): 231–54
View details for DOI 10.1214/18-EJS1527
View details for Web of Science ID 000465088200008
-
EFFECTIVE DIMENSION OF SOME WEIGHTED PRE-SOBOLEV SPACES WITH DOMINATING MIXED PARTIAL DERIVATIVES
SIAM JOURNAL ON NUMERICAL ANALYSIS
2019; 57 (2): 547–62
View details for DOI 10.1137/17M1158975
View details for Web of Science ID 000466423000002
-
SINGLE NUGGET KRIGING
STATISTICA SINICA
2018; 28 (2): 649–69
View details for DOI 10.5705/ss.202016.0255
View details for Web of Science ID 000450211500006
-
CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING
ANNALS OF STATISTICS
2017; 45 (5): 1863–94
View details for DOI 10.1214/16-AOS1511
View details for Web of Science ID 000416455300002
-
CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING.
Annals of statistics
2017; 45 (5): 1863-1894
Abstract
We consider large-scale studies in which thousands of significance tests are performed simultaneously. In some of these studies, the multiple testing procedure can be severely biased by latent confounding factors such as batch effects and unmeasured covariates that correlate with both primary variable(s) of interest (e.g., treatment variable, phenotype) and the outcome. Over the past decade, many statistical methods have been proposed to adjust for the confounders in hypothesis testing. We unify these methods in the same framework, generalize them to include multiple primary variables and multiple nuisance variables, and analyze their statistical properties. In particular, we provide theoretical guarantees for RUV-4 [Gagnon-Bartsch, Jacob and Speed (2013)] and LEAPP [Ann. Appl. Stat.6 (2012) 1664-1688], which correspond to two different identification conditions in the framework: the first requires a set of "negative controls" that are known a priori to follow the null distribution; the second requires the true nonnulls to be sparse. Two different estimators which are based on RUV-4 and LEAPP are then applied to these two scenarios. We show that if the confounding factors are strong, the resulting estimators can be asymptotically as powerful as the oracle estimator which observes the latent confounding factors. For hypothesis testing, we show the asymptotic z-tests based on the estimators can control the type I error. Numerical experiments show that the false discovery rate is also controlled by the Benjamini-Hochberg procedure when the sample size is reasonably large.
View details for DOI 10.1214/16-AOS1511
View details for PubMedID 31439967
View details for PubMedCentralID PMC6706069
-
Scrambled Geometric Net Integration Over General Product Spaces
FOUNDATIONS OF COMPUTATIONAL MATHEMATICS
2017; 17 (2): 467-496
View details for DOI 10.1007/s10208-015-9293-5
View details for Web of Science ID 000398888500004
-
On Shapley Value for Measuring Importance of Dependent Inputs
SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION
2017; 5 (1): 986–1002
View details for DOI 10.1137/16M1097717
View details for Web of Science ID 000424574600037
-
Efficient moment calculations for variance components in large unbalanced crossed random effects models
ELECTRONIC JOURNAL OF STATISTICS
2017; 11 (1): 1235–96
View details for DOI 10.1214/17-EJS1236
View details for Web of Science ID 000408006600039
-
Statistically Efficient Thinning of a Markov Chain Sampler
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
2017; 26 (3): 738–44
View details for DOI 10.1080/10618600.2017.1336446
View details for Web of Science ID 000410916600025
-
Extensible grids: uniform sampling on a space filling curve
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
2016; 78 (4): 917-931
View details for DOI 10.1111/rssb.12132
View details for Web of Science ID 000380720300010
-
A constraint on extensible quadrature rules
NUMERISCHE MATHEMATIK
2016; 132 (3): 511-518
View details for DOI 10.1007/s00211-015-0724-7
View details for Web of Science ID 000372170200004
-
Bi-Cross-Validation for Factor Analysis
STATISTICAL SCIENCE
2016; 31 (1): 119-139
View details for DOI 10.1214/15-STS539
View details for Web of Science ID 000370283600012
-
RANSFORMATIONS AND HARDY-KRAUSE VARIATION
SIAM JOURNAL ON NUMERICAL ANALYSIS
2016; 54 (3): 1946-1966
View details for DOI 10.1137/15M1052184
View details for Web of Science ID 000385026000026
-
Optimal multiple testing under a Gaussian prior on the effect sizes
BIOMETRIKA
2015; 102 (4): 753-766
Abstract
We develop a new method for large-scale frequentist multiple testing with Bayesian prior information. We find optimal [Formula: see text]-value weights that maximize the average power of the weighted Bonferroni method. Due to the nonconvexity of the optimization problem, previous methods that account for uncertain prior information are suitable for only a small number of tests. For a Gaussian prior on the effect sizes, we give an efficient algorithm that is guaranteed to find the optimal weights nearly exactly. Our method can discover new loci in genome-wide association studies and compares favourably to competitors. An open-source implementation is available.
View details for DOI 10.1093/biomet/asv050
View details for Web of Science ID 000366379000001
View details for PubMedCentralID PMC4813057
-
Genome-Wide Scan Informed by Age-Related Disease Identifies Loci for Exceptional Human Longevity.
PLoS genetics
2015; 11 (12): e1005728
Abstract
We developed a new statistical framework to find genetic variants associated with extreme longevity. The method, informed GWAS (iGWAS), takes advantage of knowledge from large studies of age-related disease in order to narrow the search for SNPs associated with longevity. To gain support for our approach, we first show there is an overlap between loci involved in disease and loci associated with extreme longevity. These results indicate that several disease variants may be depleted in centenarians versus the general population. Next, we used iGWAS to harness information from 14 meta-analyses of disease and trait GWAS to identify longevity loci in two studies of long-lived humans. In a standard GWAS analysis, only one locus in these studies is significant (APOE/TOMM40) when controlling the false discovery rate (FDR) at 10%. With iGWAS, we identify eight genetic loci to associate significantly with exceptional human longevity at FDR < 10%. We followed up the eight lead SNPs in independent cohorts, and found replication evidence of four loci and suggestive evidence for one more with exceptional longevity. The loci that replicated (FDR < 5%) included APOE/TOMM40 (associated with Alzheimer's disease), CDKN2B/ANRIL (implicated in the regulation of cellular senescence), ABO (tags the O blood group), and SH2B3/ATXN2 (a signaling gene that extends lifespan in Drosophila and a gene involved in neurological disease). Our results implicate new loci in longevity and reveal a genetic overlap between longevity and age-related diseases and traits, including coronary artery disease and Alzheimer's disease. iGWAS provides a new analytical strategy for uncovering SNPs that influence extreme longevity, and can be applied more broadly to boost power in other studies of complex phenotypes.
View details for DOI 10.1371/journal.pgen.1005728
View details for PubMedID 26677855
View details for PubMedCentralID PMC4683064
-
Optimal multiple testing under a Gaussian prior on the effect sizes.
Biometrika
2015; 102 (4): 753-766
Abstract
We develop a new method for large-scale frequentist multiple testing with Bayesian prior information. We find optimal [Formula: see text]-value weights that maximize the average power of the weighted Bonferroni method. Due to the nonconvexity of the optimization problem, previous methods that account for uncertain prior information are suitable for only a small number of tests. For a Gaussian prior on the effect sizes, we give an efficient algorithm that is guaranteed to find the optimal weights nearly exactly. Our method can discover new loci in genome-wide association studies and compares favourably to competitors. An open-source implementation is available.
View details for DOI 10.1093/biomet/asv050
View details for PubMedID 27046938
View details for PubMedCentralID PMC4813057
-
Genome-Wide Scan Informed by Age-Related Disease Identifies Loci for Exceptional Human Longevity
PLOS GENETICS
2015; 11 (12)
Abstract
We developed a new statistical framework to find genetic variants associated with extreme longevity. The method, informed GWAS (iGWAS), takes advantage of knowledge from large studies of age-related disease in order to narrow the search for SNPs associated with longevity. To gain support for our approach, we first show there is an overlap between loci involved in disease and loci associated with extreme longevity. These results indicate that several disease variants may be depleted in centenarians versus the general population. Next, we used iGWAS to harness information from 14 meta-analyses of disease and trait GWAS to identify longevity loci in two studies of long-lived humans. In a standard GWAS analysis, only one locus in these studies is significant (APOE/TOMM40) when controlling the false discovery rate (FDR) at 10%. With iGWAS, we identify eight genetic loci to associate significantly with exceptional human longevity at FDR < 10%. We followed up the eight lead SNPs in independent cohorts, and found replication evidence of four loci and suggestive evidence for one more with exceptional longevity. The loci that replicated (FDR < 5%) included APOE/TOMM40 (associated with Alzheimer's disease), CDKN2B/ANRIL (implicated in the regulation of cellular senescence), ABO (tags the O blood group), and SH2B3/ATXN2 (a signaling gene that extends lifespan in Drosophila and a gene involved in neurological disease). Our results implicate new loci in longevity and reveal a genetic overlap between longevity and age-related diseases and traits, including coronary artery disease and Alzheimer's disease. iGWAS provides a new analytical strategy for uncovering SNPs that influence extreme longevity, and can be applied more broadly to boost power in other studies of complex phenotypes.
View details for DOI 10.1371/journal.pgen.1005728
View details for Web of Science ID 000368518400057
View details for PubMedCentralID PMC4683064
-
Moment based gene set tests
BMC BIOINFORMATICS
2015; 16
Abstract
Permutation-based gene set tests are standard approaches for testing relationships between collections of related genes and an outcome of interest in high throughput expression analyses. Using M random permutations, one can attain p-values as small as 1/(M+1). When many gene sets are tested, we need smaller p-values, hence larger M, to achieve significance while accounting for the number of simultaneous tests being made. As a result, the number of permutations to be done rises along with the cost per permutation. To reduce this cost, we seek parametric approximations to the permutation distributions for gene set tests.We study two gene set methods based on sums and sums of squared correlations. The statistics we study are among the best performers in the extensive simulation of 261 gene set methods by Ackermann and Strimmer in 2009. Our approach calculates exact relevant moments of these statistics and uses them to fit parametric distributions. The computational cost of our algorithm for the linear case is on the order of doing |G| permutations, where |G| is the number of genes in set G. For the quadratic statistics, the cost is on the order of |G|(2) permutations which can still be orders of magnitude faster than plain permutation sampling. We applied the permutation approximation method to three public Parkinson's Disease expression datasets and discovered enriched gene sets not previously discussed. We found that the moment-based gene set enrichment p-values closely approximate the permutation method p-values at a tiny fraction of their cost. They also gave nearly identical rankings to the gene sets being compared.We have developed a moment based approximation to linear and quadratic gene set test statistics' permutation distribution. This allows approximate testing to be done orders of magnitude faster than one could do by sampling permutations. We have implemented our method as a publicly available Bioconductor package, npGSEA (www.bioconductor.org) .
View details for DOI 10.1186/s12859-015-0571-7
View details for Web of Science ID 000353871900001
View details for PubMedID 25928861
View details for PubMedCentralID PMC4419444
-
LOW DISCREPANCY CONSTRUCTIONS IN THE TRIANGLE
SIAM JOURNAL ON NUMERICAL ANALYSIS
2015; 53 (2): 743-761
View details for DOI 10.1137/140960463
View details for Web of Science ID 000353844900003
-
Data enriched linear regression
ELECTRONIC JOURNAL OF STATISTICS
2015; 9 (1): 1078-1112
View details for DOI 10.1214/15-EJS1027
View details for Web of Science ID 000366268800036
-
The Sign of the Logistic Regression Coefficient
AMERICAN STATISTICIAN
2014; 68 (4): 297-301
View details for DOI 10.1080/00031305.2014.951128
View details for Web of Science ID 000345139300009
-
Higher order Sobol' indices
INFORMATION AND INFERENCE-A JOURNAL OF THE IMA
2014; 3 (1): 59–81
View details for DOI 10.1093/imaiai/iau001
View details for Web of Science ID 000218924200003
-
Sobol' Indices and Shapley Value
SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION
2014; 2 (1): 245–51
View details for DOI 10.1137/130936233
View details for Web of Science ID 000421346900011
-
Self-concordance for empirical likelihood
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE
2013; 41 (3): 387-397
View details for DOI 10.1002/cjs.11183
View details for Web of Science ID 000322963400001
-
Better Estimation of Small Sobol' Sensitivity Indices
ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION
2013; 23 (2)
View details for DOI 10.1145/2457459.2457460
View details for Web of Science ID 000318944000001
-
Variance Components and Generalized Sobol' Indices
SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION
2013; 1 (1): 19–41
View details for DOI 10.1137/120876782
View details for Web of Science ID 000213796800003
-
Correct Ordering in the Zipf-Poisson Ensemble
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2012; 107 (500): 1510-1517
View details for DOI 10.1080/01621459.2012.734177
View details for Web of Science ID 000313394600020
-
MULTIPLE HYPOTHESIS TESTING ADJUSTED FOR LATENT VARIABLES, WITH AN APPLICATION TO THE AGEMAP GENE EXPRESSION DATA
ANNALS OF APPLIED STATISTICS
2012; 6 (4): 1664-1688
View details for DOI 10.1214/12-AOAS561
View details for Web of Science ID 000314458400014
-
BOOTSTRAPPING DATA ARRAYS OF ARBITRARY ORDER
ANNALS OF APPLIED STATISTICS
2012; 6 (3): 895-927
View details for DOI 10.1214/12-AOAS547
View details for Web of Science ID 000314457400004
-
A Sparse Transmission Disequilibrium Test for Haplotypes Based on Bradley-Terry Graphs
HUMAN HEREDITY
2012; 73 (1): 52-61
Abstract
Linkage and association analysis based on haplotype transmission disequilibrium can be more informative than single marker analysis. Several works have been proposed in recent years to extend the transmission disequilibrium test (TDT) to haplotypes. Among them, a powerful approach called the evolutionary tree TDT (ET-TDT) incorporates information about the evolutionary relationship among haplotypes using the cladogram of the locus.In this work we extend this approach by taking into consideration the sparsity of causal mutations in the evolutionary history. We first introduce the notion of a Bradley-Terry (BT) graph representation of a haplotype locus. The most important property of the BT graph is that sparsity of the edge set of the graph corresponds to small number of causal mutations in the evolution of the haplotypes. We then propose a method to test the null hypothesis of no linkage and association against sparse alternatives under which a small number of edges on the BT graph have non-nil effects.We compare the performance of our approach to that of the ET-TDT through a power study, and show that incorporating sparsity of causal mutations can significantly improve the power of a haplotype-based TDT.
View details for DOI 10.1159/000335937
View details for Web of Science ID 000302111100008
View details for PubMedID 22398955
View details for PubMedCentralID PMC3357149
-
Moment-Based Estimation of Stochastic Kronecker Graph Parameters
INTERNET MATHEMATICS
2012; 8 (3): 232–56
View details for DOI 10.1080/15427951.2012.680824
View details for Web of Science ID 000217675800002
-
Outlier Detection Using Nonconvex Penalized Regression
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2011; 106 (494): 626-639
View details for DOI 10.1198/jasa.2011.tm10390
View details for Web of Science ID 000293113300022
-
CONSISTENCY OF MARKOV CHAIN QUASI-MONTE CARLO ON CONTINUOUS STATE SPACES
ANNALS OF STATISTICS
2011; 39 (2): 673-701
View details for DOI 10.1214/10-AOS831
View details for Web of Science ID 000291183300001
-
Visualizing bivariate long-tailed data
ELECTRONIC JOURNAL OF STATISTICS
2011; 5: 642-668
View details for DOI 10.1214/11-EJS622
View details for Web of Science ID 000293080600001
-
EMPIRICAL STATIONARY CORRELATIONS FOR SEMI-SUPERVISED LEARNING ON GRAPHS
ANNALS OF APPLIED STATISTICS
2010; 4 (2): 589-614
View details for DOI 10.1214/09-AOAS293
View details for Web of Science ID 000283528500004
-
A Rotation Test to Verify Latent Structure
JOURNAL OF MACHINE LEARNING RESEARCH
2010; 11: 603-624
View details for Web of Science ID 000277186500006
-
KARL PEARSON'S META-ANALYSIS REVISITED
ANNALS OF STATISTICS
2009; 37 (6B): 3867-3892
View details for DOI 10.1214/09-AOS697
View details for Web of Science ID 000271673700006
-
Aging Mice Show a Decreasing Correlation of Gene Expression within Genetic Modules
PLOS GENETICS
2009; 5 (12)
Abstract
In this work we present a method for the differential analysis of gene co-expression networks and apply this method to look for large-scale transcriptional changes in aging. We derived synonymous gene co-expression networks from AGEMAP expression data for 16-month-old and 24-month-old mice. We identified a number of functional gene groups that change co-expression with age. Among these changing groups we found a trend towards declining correlation with age. In particular, we identified a modular (as opposed to uniform) decline in general correlation with age. We identified potential transcriptional mechanisms that may aid in modular correlation decline. We found that computationally identified targets of the NF-KappaB transcription factor decrease expression correlation with age. Finally, we found that genes that are prone to declining co-expression tend to be co-located on the chromosome. Our results conclude that there is a modular decline in co-expression with age in mice. They also indicate that factors relating to both chromosome domains and specific transcription factors may contribute to the decline.
View details for DOI 10.1371/journal.pgen.1000776
View details for Web of Science ID 000273469700026
View details for PubMedID 20019809
View details for PubMedCentralID PMC2788246
-
BI-CROSS-VALIDATION OF THE SVD AND THE NONNEGATIVE MATRIX FACTORIZATION
ANNALS OF APPLIED STATISTICS
2009; 3 (2): 564-594
View details for DOI 10.1214/08-AOAS227
View details for Web of Science ID 000271979600004
-
Properties of Balanced Permutations
JOURNAL OF COMPUTATIONAL BIOLOGY
2009; 16 (4): 625-638
Abstract
This paper takes a close look at balanced permutations, a recently developed sample reuse method with applications in bioinformatics. It turns out that balanced permutation reference distributions do not have the correct null behavior, which can be traced to their lack of a group structure. We find that they can give p-values that are too permissive to varying degrees. In particular the observed test statistic can be larger than that of all B balanced permutations of a data set with a probability much higher than 1/(B + 1), even under the null hypothesis.
View details for DOI 10.1089/cmb.2008.0144
View details for Web of Science ID 000265551400007
View details for PubMedID 19361331
View details for PubMedCentralID PMC3148117
-
Recycling physical random numbers
ELECTRONIC JOURNAL OF STATISTICS
2009; 3: 1531-1541
View details for DOI 10.1214/09-EJS541
View details for Web of Science ID 000207855300028
-
Calibration of the empirical likelihood method for a vector mean
ELECTRONIC JOURNAL OF STATISTICS
2009; 3: 1161-1192
View details for DOI 10.1214/09-EJS518
View details for Web of Science ID 000207855300016
-
Monte Carlo and Quasi-Monte Carlo for Statistics
8th International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (MCQMC 08)
SPRINGER-VERLAG BERLIN. 2009: 3–18
View details for DOI 10.1007/978-3-642-04107-5_1
View details for Web of Science ID 000282063700001
-
LOCAL ANTITHETIC SAMPLING WITH SCRAMBLED NETS
ANNALS OF STATISTICS
2008; 36 (5): 2319-2343
View details for DOI 10.1214/07-AOS548
View details for Web of Science ID 000260554100012
-
Construction of weakly CUD sequences for MCMC sampling
ELECTRONIC JOURNAL OF STATISTICS
2008; 2: 634-660
View details for DOI 10.1214/07-EJS162
View details for Web of Science ID 000207854400026
-
THE PIGEONHOLE BOOTSTRAP
ANNALS OF APPLIED STATISTICS
2007; 1 (2): 386-411
View details for DOI 10.1214/07-AOAS122
View details for Web of Science ID 000261057600007
-
AGEMAP: A gene expression database for aging in mice
PLOS GENETICS
2007; 3 (11): 2326-2337
Abstract
We present the AGEMAP (Atlas of Gene Expression in Mouse Aging Project) gene expression database, which is a resource that catalogs changes in gene expression as a function of age in mice. The AGEMAP database includes expression changes for 8,932 genes in 16 tissues as a function of age. We found great heterogeneity in the amount of transcriptional changes with age in different tissues. Some tissues displayed large transcriptional differences in old mice, suggesting that these tissues may contribute strongly to organismal decline. Other tissues showed few or no changes in expression with age, indicating strong levels of homeostasis throughout life. Based on the pattern of age-related transcriptional changes, we found that tissues could be classified into one of three aging processes: (1) a pattern common to neural tissues, (2) a pattern for vascular tissues, and (3) a pattern for steroid-responsive tissues. We observed that different tissues age in a coordinated fashion in individual mice, such that certain mice exhibit rapid aging, whereas others exhibit slow aging for multiple tissues. Finally, we compared the transcriptional profiles for aging in mice to those from humans, flies, and worms. We found that genes involved in the electron transport chain show common age regulation in all four species, indicating that these genes may be exceptionally good markers of aging. However, we saw no overall correlation of age regulation between mice and humans, suggesting that aging processes in mice and humans may be fundamentally different.
View details for DOI 10.1371/journal.pgen.0030201
View details for Web of Science ID 000251310200024
View details for PubMedID 18081424
View details for PubMedCentralID PMC2098796
-
Infinitely imbalanced logistic regression
JOURNAL OF MACHINE LEARNING RESEARCH
2007; 8: 761-773
View details for Web of Science ID 000247002800002
-
A robust hybrid of lasso and ridge regression
AMS/IMS/SIAM Joint Summer Research Conference on Machine and Statistical Learning - Prediction and Discovery
AMER MATHEMATICAL SOC. 2007: 59–71
View details for Web of Science ID 000250954400006
-
Halton sequences avoid the origin
SIAM REVIEW
2006; 48 (3): 487-503
View details for DOI 10.1137/S0036144504441573
View details for Web of Science ID 000239945600002
-
Transcriptional profiling of aging in human muscle reveals a common aging signature
PLOS GENETICS
2006; 2 (7): 1058-1069
Abstract
We analyzed expression of 81 normal muscle samples from humans of varying ages, and have identified a molecular profile for aging consisting of 250 age-regulated genes. This molecular profile correlates not only with chronological age but also with a measure of physiological age. We compared the transcriptional profile of muscle aging to previous transcriptional profiles of aging in the kidney and the brain, and found a common signature for aging in these diverse human tissues. The common aging signature consists of six genetic pathways; four pathways increase expression with age (genes in the extracellular matrix, genes involved in cell growth, genes encoding factors involved in complement activation, and genes encoding components of the cytosolic ribosome), while two pathways decrease expression with age (genes involved in chloride transport and genes encoding subunits of the mitochondrial electron transport chain). We also compared transcriptional profiles of aging in humans to those of the mouse and fly, and found that the electron transport chain pathway decreases expression with age in all three organisms, suggesting that this may be a public marker for aging across species.
View details for DOI 10.1371/journal.pgen.0020115
View details for PubMedID 16789832
-
Estimating mean dimensionality of analysis of variance decompositions
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2006; 101 (474): 712-721
View details for DOI 10.1198/016214505000001410
View details for Web of Science ID 000238033200025
-
On the Warnock-Halton quasi-standard error
MONTE CARLO METHODS AND APPLICATIONS
2006; 12 (1): 47–54
View details for DOI 10.1163/156939606776886652
View details for Web of Science ID 000416663500003
-
Quasi-Monte Carlo for integrands with point singularities at unknown locations
6th International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing
SPRINGER-VERLAG BERLIN. 2006: 403–417
View details for Web of Science ID 000235235900024
-
A quasi-Monte Carlo Metropolis algorithm
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2005; 102 (25): 8844-8849
Abstract
This work presents a version of the Metropolis-Hastings algorithm using quasi-Monte Carlo inputs. We prove that the method yields consistent estimates in some problems with finite state spaces and completely uniformly distributed inputs. In some numerical examples, the proposed method is much more accurate than ordinary Metropolis-Hastings sampling.
View details for DOI 10.1073/pnas.0409596102
View details for Web of Science ID 000230049500012
View details for PubMedID 15956207
View details for PubMedCentralID PMC1150275
-
Control variates for quasi-Monte Carlo
STATISTICAL SCIENCE
2005; 20 (1): 1-18
View details for DOI 10.1214/088342304000000468
View details for Web of Science ID 000229906000001
-
Variance of the number of false discoveries
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
2005; 67: 411-426
View details for Web of Science ID 000229902600006
-
A transcriptional profile of aging in the human kidney
PLOS BIOLOGY
2004; 2 (12): 2191-2201
Abstract
In this study, we found 985 genes that change expression in the cortex and the medulla of the kidney with age. Some of the genes whose transcripts increase in abundance with age are known to be specifically expressed in immune cells, suggesting that immune surveillance or inflammation increases with age. The age-regulated genes show a similar aging profile in the cortex and the medulla, suggesting a common underlying mechanism for aging. Expression profiles of these age-regulated genes mark not only age, but also the relative health and physiology of the kidney in older individuals. Finally, the set of aging-regulated kidney genes suggests specific mechanisms and pathways that may play a role in kidney degeneration with age.
View details for DOI 10.1371/journal.pbio.0020427
View details for PubMedID 15562319
-
The host response to smallpox: Analysis of the gene expression program in peripheral blood cells in a nonhuman primate model
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2004; 101 (42): 15190-15195
Abstract
Smallpox has played an unparalleled role in human history and remains a significant potential threat to public health. Despite the historical significance of this disease, we know little about the underlying pathophysiology or the virulence mechanisms of the causative agent, variola virus. To improve our understanding of variola pathogenesis and variola-host interactions, we examined the molecular and cellular features of hemorrhagic smallpox in cynomolgus macaques. We used cDNA microarrays to analyze host gene expression patterns in sequential blood samples from each of 22 infected animals. Variola infection elicited striking and temporally coordinated patterns of gene expression in peripheral blood. Of particular interest were features that appear to represent an IFN response, cell proliferation, immunoglobulin gene expression, viral dose-dependent gene expression patterns, and viral modulation of the host immune response. The virtual absence of a tumor necrosis factor alpha/NF-kappaB-activated transcriptional program in the face of an overwhelming systemic infection suggests that variola gene products may ablate this response. These results provide a detailed picture of the host transcriptional response during smallpox infection, and may help guide the development of diagnostic, therapeutic, and prophylactic strategies.
View details for DOI 10.1073/pnas.0405759101
View details for Web of Science ID 000224688700039
View details for PubMedID 15477590
View details for PubMedCentralID PMC523453
-
Genomic research and human subject privacy
SCIENCE
2004; 305 (5681): 183-183
View details for Web of Science ID 000222501000030
View details for PubMedID 15247459
-
Nomogram for predicting the likelihood of delayed graft function in adult cadaveric renal transplant recipients
JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY
2003; 14 (11): 2967–74
Abstract
Delayed graft function (DGF) is the need for dialysis in the first week after transplantation. Studied were risk factors for DGF in adult (age >/=16 yr) cadaveric renal transplant recipients by means of a multivariable modeling procedure. Only donor and recipient factors known before transplantation were chosen so that the probabilities of DGF could be calculated before transplantation and appropriate preventative measures taken. Data on 19,706 recipients of cadaveric allografts were obtained from the United States Renal Data System registry (1995 to 1998). Graft losses within the first 24 h after surgery were excluded from the analysis (n = 89). Patients whose DGF information was missing or unknown (n = 2820) and patients missing one or more candidate predictors (n = 2951) were also excluded. By means of a multivariable logistic regression analysis, factors contributing to DGF in the remaining 13,846 patients were identified. After validating the logistic regression model, a nomogram was developed as a tool for identifying patients at risk for DGF. The incidence of DGF was 23.7%. Sixteen independent donor or recipient risk factors were found to predict DGF. A nomogram quantifying the relative contribution of each risk factor was created. This index can be used to calculate the risk of DGF for an individual by adding the points associated with each risk factor. The nomogram provides a useful tool for developing a pretransplantation index of the likelihood of DGF occurrence. With this index in hand, better informed treatment and allocation decisions can be made.
View details for DOI 10.1097/01.ASN.0000093254.31868.85
View details for Web of Science ID 000186073500032
View details for PubMedID 14569108
-
A gene recommender algorithm to identify coexpressed genes in C-elegans
GENOME RESEARCH
2003; 13 (8): 1828-1837
Abstract
One of the most important uses of whole-genome expression data is for the discovery of new genes with similar function to a given list of genes (the query) already known to have closely related function. We have developed an algorithm, called the gene recommender, that ranks genes according to how strongly they correlate with a set of query genes in those experiments for which the query genes are most strongly coregulated. We used the gene recommender to find other genes coexpressed with several sets of query genes, including genes known to function in the retinoblastoma complex. Genetic experiments confirmed that one gene (JC8.6) identified by the gene recommender acts with lin-35 Rb to regulate vulval cell fates, and that another gene (wrm-1) acts antagonistically. We find that the gene recommender returns lists of genes with better precision, for fixed levels of recall, than lists generated using the C. elegans expression topomap.
View details for DOI 10.1101/gr.1125403
View details for Web of Science ID 000184530900005
View details for PubMedID 12902378
View details for PubMedCentralID PMC403774
-
A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae)
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2003; 100 (14): 8348-8353
Abstract
Genomic sequencing is no longer a novelty, but gene function annotation remains a key challenge in modern biology. A variety of functional genomics experimental techniques are available, from classic methods such as affinity precipitation to advanced high-throughput techniques such as gene expression microarrays. In the future, more disparate methods will be developed, further increasing the need for integrated computational analysis of data generated by these studies. We address this problem with MAGIC (Multisource Association of Genes by Integration of Clusters), a general framework that uses formal Bayesian reasoning to integrate heterogeneous types of high-throughput biological data (such as large-scale two-hybrid screens and multiple microarray analyses) for accurate gene function prediction. The system formally incorporates expert knowledge about relative accuracies of data sources to combine them within a normative framework. MAGIC provides a belief level with its output that allows the user to vary the stringency of predictions. We applied MAGIC to Saccharomyces cerevisiae genetic and physical interactions, microarray, and transcription factor binding sites data and assessed the biological relevance of gene groupings using Gene Ontology annotations produced by the Saccharomyces Genome Database. We found that by creating functional groupings based on heterogeneous data types, MAGIC improved accuracy of the groupings compared with microarray analysis alone. We describe several of the biological gene groupings identified.
View details for DOI 10.1073/pnas.0832373100
View details for Web of Science ID 000184222500057
View details for PubMedID 12826619
View details for PubMedCentralID PMC166232
-
Monthly Strontium/Calcium oscillations in symbiotic coral aragonite: Biological effects limiting the precision of the paleotemperature proxy
GEOPHYSICAL RESEARCH LETTERS
2003; 30 (7)
View details for DOI 10.1029/2002GL016864
View details for Web of Science ID 000182592200002
-
Quasi-regression with shrinkage
3rd IMACS Seminar on Monte Carlo Methods (MCM 2001)
ELSEVIER SCIENCE BV. 2003: 231–41
View details for Web of Science ID 000181605300003
-
Data squashing by empirical likelihood
DATA MINING AND KNOWLEDGE DISCOVERY
2003; 7 (1): 101-113
View details for Web of Science ID 000179705200005
-
The dimension distribution and quadrature test functions
STATISTICA SINICA
2003; 13 (1): 1-17
View details for Web of Science ID 000181313600001
-
Plaid models for gene expression data
STATISTICA SINICA
2002; 12 (1): 61-86
View details for Web of Science ID 000174372800005
-
Quasi-regression
Workshop on the Complexity of Multivariate Problems
ACADEMIC PRESS INC ELSEVIER SCIENCE. 2001: 588–607
View details for DOI 10.1006/jcom.2001.0588
View details for Web of Science ID 000173453400002
-
Safe and effective importance sampling
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2000; 95 (449): 135-143
View details for Web of Science ID 000087845100017
-
Assessing linearity in high dimensions
ANNALS OF STATISTICS
2000; 28 (1): 1-19
View details for Web of Science ID 000088077700001
-
Advances in importance sampling
Computational Finance 1999 Conference
M I T PRESS. 2000: 53–65
View details for Web of Science ID 000088555700004
-
Monte Carlo, quasi-Monte Carlo, and randomized quasi-Monte Carlo
3rd International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (MCQM 98)
SPRINGER-VERLAG BERLIN. 2000: 86–97
View details for Web of Science ID 000089259100005
-
Scrambling Sobol' and Niederreiter-Xing points
JOURNAL OF COMPLEXITY
1998; 14 (4): 466-489
View details for Web of Science ID 000078083600003
-
Monte Carlo extension of quasi-Monte Carlo
1998 Winter Simulation Conference on Simulation in the 21st-Century (WSC 98)
IEEE. 1998: 571–577
View details for Web of Science ID 000078340100077
-
Scrambled net variance for integrals of smooth functions
ANNALS OF STATISTICS
1997; 25 (4): 1541-1562
View details for Web of Science ID 000079134900009
-
NONPARAMETRIC LIKELIHOOD CONFIDENCE BANDS FOR A DISTRIBUTION FUNCTION
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
1995; 90 (430): 516-521
View details for Web of Science ID A1995RA10400016
-
CONTROLLING CORRELATIONS IN LATIN HYPERCUBE SAMPLES
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
1994; 89 (428): 1517-1522
View details for Web of Science ID A1994PU33000037
-
ASYMPTOTICALLY OPTIMAL BALLOON DENSITY ESTIMATES
JOURNAL OF MULTIVARIATE ANALYSIS
1994; 51 (2): 352-371
View details for Web of Science ID A1994PU37200009
-
LATTICE SAMPLING REVISITED - MONTE-CARLO VARIANCE OF MEANS OVER RANDOMIZED ORTHOGONAL ARRAYS
ANNALS OF STATISTICS
1994; 22 (2): 930-945
View details for Web of Science ID A1994PN35700020
-
OVERFITTING IN NEURAL NETWORKS
26th Symposium on the Interface of Computing Science and Statistics - Computationally Intensive Statistical Methods
INTERFACE FOUNDATION NORTH AMERICA. 1994: 57–62
View details for Web of Science ID A1994BD22V00009
-
NEURAL NETWORKS AND RELATED METHODS FOR CLASSIFICATION - DISCUSSION
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
1994; 56 (3): 437-456
View details for Web of Science ID A1994NK40900002
-
ORTHOGONAL ARRAYS FOR COMPUTER EXPERIMENTS, INTEGRATION AND VISUALIZATION
STATISTICA SINICA
1992; 2 (2): 439-452
View details for Web of Science ID A1992JQ01200007
-
A CENTRAL-LIMIT-THEOREM FOR LATIN HYPERCUBE SAMPLING
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL
1992; 54 (2): 541-551
View details for Web of Science ID A1992HL62000012
-
EMPIRICAL LIKELIHOOD FOR LINEAR-MODELS
ANNALS OF STATISTICS
1991; 19 (4): 1725-1747
View details for Web of Science ID A1991GZ66100002
-
MULTIVARIATE ADAPTIVE REGRESSION SPLINES - DISCUSSION
ANNALS OF STATISTICS
1991; 19 (1): 102-112
View details for Web of Science ID A1991FF04700007
-
EMPIRICAL LIKELIHOOD RATIO CONFIDENCE-REGIONS
ANNALS OF STATISTICS
1990; 18 (1): 90-120
View details for Web of Science ID A1990DA37500004
-
USING SIMULATORS TO MODEL TRANSMITTED VARIABILITY IN IC MANUFACTURING
IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING
1989; 2 (3): 82-93
View details for Web of Science ID A1989AD91600003
-
EMPIRICAL LIKELIHOOD RATIO CONFIDENCE-INTERVALS FOR A SINGLE FUNCTIONAL
BIOMETRIKA
1988; 75 (2): 237-249
View details for Web of Science ID A1988N941300007
-
SMOOTHING WITH SPLIT LINEAR FITS
TECHNOMETRICS
1986; 28 (3): 195-208
View details for Web of Science ID A1986D449800002
-
STATISTICS, IMAGES, AND PATTERN-RECOGNITION - DISCUSSION
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE
1986; 14 (2): 102-111
View details for Web of Science ID A1986D245700002