David Siegmund
John D. and Sigrid Banks Professor, Emeritus
Statistics
2024-25 Courses
-
Independent Studies (2)
- Industrial Research for Statisticians
STATS 398 (Aut, Win, Spr) - Research
STATS 399 (Aut, Win, Spr)
- Industrial Research for Statisticians
-
Prior Year Courses
2023-24 Courses
- Sequential Analysis
STATS 223, STATS 323 (Aut)
2022-23 Courses
- Consulting Workshop
STATS 390 (Win) - Literature of Statistics
STATS 319 (Win) - Theory of Probability III
MATH 230C, STATS 310C (Spr)
2021-22 Courses
- Consulting Workshop
STATS 390 (Win) - Literature of Statistics
STATS 319 (Aut) - Sequential Analysis
STATS 223, STATS 323 (Win)
- Sequential Analysis
All Publications
-
SEGMENTATION AND ESTIMATION OF CHANGE-POINT MODELS: FALSE POSITIVE CONTROL AND CONFIDENCE REGIONS
ANNALS OF STATISTICS
2020; 48 (3): 1615–47
View details for DOI 10.1214/19-AOS1861
View details for Web of Science ID 000551644000017
-
A genome-wide approach for detecting novel insertion-deletion variants of mid-range size.
Nucleic acids research
2016; 44 (15)
Abstract
We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.
View details for DOI 10.1093/nar/gkw481
View details for PubMedID 27325742
View details for PubMedCentralID PMC5009736
-
POISSON APPROXIMATION FOR TWO SCAN STATISTICS WITH RATES OF CONVERGENCE
ANNALS OF APPLIED PROBABILITY
2016; 26 (4): 2384-2418
View details for DOI 10.1214/15-AAP1150
View details for Web of Science ID 000383411200014
-
SCAN STATISTICS ON POISSON RANDOM FIELDS WITH APPLICATIONS IN GENOMICS
ANNALS OF APPLIED STATISTICS
2016; 10 (2): 726-755
View details for DOI 10.1214/15-AOAS892
View details for Web of Science ID 000385029700008
-
HIGHER CRITICISM: p-VALUES AND CRITICISM
ANNALS OF STATISTICS
2015; 43 (3): 1323-1350
View details for DOI 10.1214/15-AOS1312
View details for Web of Science ID 000355768700013
-
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION
ANNALS OF STATISTICS
2013; 41 (2): 670-692
View details for DOI 10.1214/13-AOS1094
View details for Web of Science ID 000320488200010
-
Change-Points: From Sequential Detection to Biology and Back
SEQUENTIAL ANALYSIS-DESIGN METHODS AND APPLICATIONS
2013; 32 (1): 2-14
View details for DOI 10.1080/07474946.2013.751834
View details for Web of Science ID 000323819400002
-
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION
Information Theory and Applications Workshop
IEEE. 2013
View details for Web of Science ID 000321214400067
-
MODEL SELECTION FOR HIGH-DIMENSIONAL, MULTI-SEQUENCE CHANGE-POINT PROBLEMS
STATISTICA SINICA
2012; 22 (4): 1507-1538
View details for DOI 10.5705/ss.2010.257
View details for Web of Science ID 000311812800014
-
Spectrum Opportunity Detection with Weak and Correlated Signals
46th Asilomar Conference on Signals, Systems and Computers
IEEE. 2012: 128–132
View details for Web of Science ID 000320768400022
-
False discovery rate for scanning statistics
BIOMETRIKA
2011; 98 (4): 979-985
View details for DOI 10.1093/biomet/asr057
View details for Web of Science ID 000297366000016
-
DETECTING SIMULTANEOUS VARIANT INTERVALS IN ALIGNED SEQUENCES
ANNALS OF APPLIED STATISTICS
2011; 5 (2A): 645-668
View details for DOI 10.1214/10-AOAS400
View details for Web of Science ID 000295453300003
-
Joint Testing of Genotype and Ancestry Association in Admixed Families
GENETIC EPIDEMIOLOGY
2010; 34 (8): 783-791
Abstract
Current genome-wide association studies (GWAS) often involve populations that have experienced recent genetic admixture. Genotype data generated from these studies can be used to test for association directly, as in a non-admixed population. As an alternative, these data can be used to infer chromosomal ancestry, and thus allow for admixture mapping. We quantify the contribution of allele-based and ancestry-based association testing under a family-design, and demonstrate that the two tests can provide non-redundant information. We propose a joint testing procedure, which efficiently integrates the two sources information. The efficiencies of the allele, ancestry and combined tests are compared in the context of a GWAS. We discuss the impact of population history and provide guidelines for future design and analysis of GWAS in admixed populations.
View details for DOI 10.1002/gepi.20520
View details for Web of Science ID 000284719100002
View details for PubMedID 21031451
View details for PubMedCentralID PMC3103820
-
Detecting simultaneous changepoints in multiple sequences
BIOMETRIKA
2010; 97 (3): 631-645
Abstract
We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.
View details for DOI 10.1093/biomet/asq025
View details for Web of Science ID 000280904000008
View details for PubMedCentralID PMC3372242
-
Detecting simultaneous changepoints in multiple sequences.
Biometrika
2010; 97 (3): 631-645
Abstract
We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.
View details for DOI 10.1093/biomet/asq025
View details for PubMedID 22822250
View details for PubMedCentralID PMC3372242
-
Mapping Quantitative Traits in Unselected Families: Algorithms and Examples
GENETIC EPIDEMIOLOGY
2009; 33 (7): 617-627
Abstract
Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic, which in contrast to the likelihood ratio statistic can use nonparametric estimators of variability to achieve robustness of the false-positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity by descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study.
View details for DOI 10.1002/gepi.20413
View details for Web of Science ID 000271406100006
View details for PubMedID 19278016
View details for PubMedCentralID PMC2766029
-
Minimax optimality of the Shiryayev-Roberts change-point detection rule
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
2008; 138 (9): 2815-2825
View details for DOI 10.1016/j.jspi.2008.03.016
View details for Web of Science ID 000256602600019
-
The distribution of maxima of approximately Gaussian random fields
ANNALS OF STATISTICS
2008; 36 (3): 1375-1403
View details for DOI 10.1214/07-AOS511
View details for Web of Science ID 000256504400014
-
Detecting the emergence of a signal in a noisy image
STATISTICS AND ITS INTERFACE
2008; 1 (1): 3-12
View details for Web of Science ID 000207654700002
-
A unified framework for linkage and association analysis of quantitative traits
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2007; 104 (51): 20210-20215
Abstract
We give a unified treatment of the statistical foundations of population based association mapping and of family based linkage mapping of quantitative traits in humans. A central ingredient in the unification involves the efficient score statistic. The discussion focuses on generalized linear models with an additional illustration of the Cox (proportional hazards) model for age of onset data. We give analytic expressions for noncentrality parameters and show how they give qualitative insight into the loss of power that occurs if the scientist's assumed genetic model differs from nature's "true" genetic model. Issues to be studied in detail in the future development of this approach are discussed.
View details for DOI 10.1073/pnas.0707138105
View details for Web of Science ID 000251885000013
View details for PubMedID 18077372
View details for PubMedCentralID PMC2154410
-
Importance sampling for estimating p values in linkage analysis
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2007; 102 (479): 929-937
View details for DOI 10.1198/016214507000000680
View details for Web of Science ID 000249752300021
-
A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data
BIOMETRICS
2007; 63 (1): 22-32
Abstract
In the analysis of data generated by change-point processes, one critical challenge is to determine the number of change-points. The classic Bayes information criterion (BIC) statistic does not work well here because of irregularities in the likelihood function. By asymptotic approximation of the Bayes factor, we derive a modified BIC for the model of Brownian motion with changing drift. The modified BIC is similar to the classic BIC in the sense that the first term consists of the log likelihood, but it differs in the terms that penalize for model dimension. As an example of application, this new statistic is used to analyze array-based comparative genomic hybridization (array-CGH) data. Array-CGH measures the number of chromosome copies at each genome location of a cell sample, and is useful for finding the regions of genome deletion and amplification in tumor cells. The modified BIC performs well compared to existing methods in accurately choosing the number of regions of changed copy number. Unlike existing methods, it does not rely on tuning parameters or intensive computing. Thus it is impartial and easier to understand and to use.
View details for DOI 10.1111/j.1541-0420.2006.00662.x
View details for Web of Science ID 000244647100003
View details for PubMedID 17447926
-
Statistical corrections of linkage data suggest predominantly cis regulations of gene expression.
BMC proceedings
2007; 1: S145-?
Abstract
Morley et al. (Nature 2004, 430:743-747) detected significant linkages to the expression levels of 142 genes (of 3554) at a reported threshold of genome-wide p = 0.001 (LOD asymptotically equal to 5.3), using 14 three-generation Centre d'Etude du Polymorphisme Humain pedigrees. Most of the linkages (77%) were trans, i.e., more than 5 Mb from the expressed gene. However, the analysis did not account for the expected anti-conservative effect of the skewed distribution of score- or regression-based statistics in large sibships, or for the possible variance distortion due to correlations among tests. Therefore, we re-analyzed their data, using a robust score statistic for the entire pedigrees and correcting the p-values for skewness. We found that a LOD of 5.3 had a skewness-corrected genome-wide p-value of 0.016 instead of 0.001 (a result that we confirmed using simulation), with around 50 expected false positives. We then further corrected for correlation among the (skew-corrected) p-values by using Efron's method for obtaining the empirical null distribution. Setting a threshold of FDR = 10% (Z = 6.4, LOD = 8.9), we detected linkage for the expression levels of 22 genes, 19 of which are cis. Limiting the analysis to cis regions, linkage was detected to the expression levels of 46 genes with 4.6 expected false positives (FDR = 10%).
View details for PubMedID 18466489
-
Approximating the variance of the conditional probability of the state of a hidden Markov model
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY
2007; 6
Abstract
In a hidden Markov model, one "estimates" the state of the hidden Markov chain at t by computing via the forwards-backwards algorithm the conditional distribution of the state vector given the observed data. The covariance matrix of this conditional distribution measures the information lost by failure to observe directly the state of the hidden process. In the case where changes of state occur slowly relative to the speed at which information about the underlying state accumulates in the observed data, we compute approximately these covariances in terms of functionals of Brownian motion that arise in change-point analysis. Applications in gene mapping, where these covariances play a role in standardizing the score statistic and in evaluating the loss of noncentrality due to incomplete information, are discussed. Numerical examples illustrate the range of validity and limitations of our results.
View details for Web of Science ID 000252387000001
View details for PubMedID 17672820
-
QTL mapping under ascertainment
ANNALS OF HUMAN GENETICS
2006; 70: 867-881
Abstract
Mapping quantitative trait loci (QTL) using ascertained sibships is discussed. It is shown that under the standard normality assumption of variance components analysis the efficient scores are unchanged by ascertainment, and two different schemes of ascertainment correction suggested in the literature are asymptotically equivalent. The use of conditional maximum likelihood estimators derived under the normality assumption to estimate nuisance parameters is shown to result in only a small loss of power compared to the case of known parameters, even when the distribution of phenotypes is non-normal and/or the ascertainment criterion is ill defined.
View details for DOI 10.1111/j.1469-1809.2006.00286.x
View details for Web of Science ID 000241191400019
View details for PubMedID 17044862
-
Spatial regulation and the rate of signal transduction activation
PLOS COMPUTATIONAL BIOLOGY
2006; 2 (5): 343-349
Abstract
Of the many important signaling events that take place on the surface of a mammalian cell, activation of signal transduction pathways via interactions of cell surface receptors is one of the most important. Evidence suggests that cell surface proteins are not as freely diffusible as implied by the classic fluid mosaic model and that their confinement to membrane domains is regulated. It is unknown whether these dynamic localization mechanisms function to enhance signal transduction activation rate or to minimize cross talk among pathways that share common intermediates. To determine which of these two possibilities is more likely, we derive an explicit equation for the rate at which cell surface membrane proteins interact based on a Brownian motion model in the presence of endocytosis and exocytosis. We find that in the absence of any diffusion constraints, cell surface protein interaction rate is extremely high relative to cytoplasmic protein interaction rate even in a large mammalian cell with a receptor abundance of a mere two hundred molecules. Since a larger number of downstream signaling events needs to take place, each occurring at a much slower rate than the initial activation via association of cell surface proteins, we conclude that the role of co-localization is most likely that of cross-talk reduction rather than coupling efficiency enhancement.
View details for DOI 10.1371/journal.pcbi.0020044
View details for Web of Science ID 000239493900003
View details for PubMedID 16699596
View details for PubMedCentralID PMC1458967
-
Genome scans with gene-covariate interaction
GENETIC EPIDEMIOLOGY
2005; 29 (3): 173-184
Abstract
Genetic models for gene-covariate interactions are described. Methods of linkage analysis that utilize special features of these models and the corresponding score statistics are derived. Their power is compared with that of simple genome scans that ignore these special features, and substantial gains in power are observed when the gene-covariate interaction is strong. Quantitative trait mapping in randomly ascertained sibships and affected sibpair mapping are discussed. For the latter case, a simpler statistic is proposed that has similar performance to the score statistic, but does not require the estimation of nuisance parameters. Since the nuisance parameters are not estimable solely from affected sib-pair data, this statistic would be much easier to apply in practice. Similarities with linkage analysis of models for longitudinal data and multivariate phenotypes are also briefly discussed. Approximations for the P-value and power are derived under the framework of local alternatives.
View details for DOI 10.1002/gepi.20100
View details for Web of Science ID 000233059200001
View details for PubMedID 16216012
-
An urn model of Diaconis
ANNALS OF PROBABILITY
2005; 33 (5): 2036-2042
View details for DOI 10.1214/009117905000000314
View details for Web of Science ID 000232345300012
-
On the power for linkage detection using a test based on scan statistics
BIOSTATISTICS
2005; 6 (2): 259-269
Abstract
We analyze some aspects of scan statistics, which have been proposed to help for the detection of weak signals in genetic linkage analysis. We derive approximate expressions for the power of a test based on moving averages of the identity by descent allele sharing proportions for pairs of relatives at several contiguous markers. We confirm these approximate formulae by simulation. The results show that when there is a single trait-locus on a chromosome, the test based on the scan statistic is slightly less powerful than that based on the customary allele sharing statistic. On the other hand, if two genes having a moderate effect on a trait lie close to each other on the same chromosome, scan statistics improve power to detect linkage.
View details for Web of Science ID 000228428200007
View details for PubMedID 15772104
-
The admixture model in linkage analysis
JOURNAL OF STATISTICAL PLANNING AND INFERENCE
2005; 130 (1-2): 317-324
View details for DOI 10.1016/j.jspi.2003.07.022
View details for Web of Science ID 000226645200019
-
Model selection in irregular problems: Applications to mapping quantitative trait loci
BIOMETRIKA
2004; 91 (4): 785-800
View details for Web of Science ID 000225940000002
-
A report on the future of statistics
STATISTICAL SCIENCE
2004; 19 (3): 387-407
View details for DOI 10.1214/088342304000000404
View details for Web of Science ID 000227884700001
-
Mapping quantitative traits with random and with ascertained sibships
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2004; 101 (21): 7845-7850
Abstract
Use of a robust score statistic based on a variance components model to map quantitative trait loci in randomly sampled pedigrees is reviewed. Sibships ascertained through a single proband are discussed. Under a standard assumption of multivariate normality, two suggested methods of ascertainment correction are shown to be asymptotically equivalent when the number of sibships is large.
View details for DOI 10.1073/pnas.0401713101
View details for Web of Science ID 000221652000003
View details for PubMedID 15084737
View details for PubMedCentralID PMC419519
-
Stochastic model of protein-protein interaction: Why signaling proteins need to be colocalized
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2004; 101 (17): 6445-6449
Abstract
Colocalization of proteins that are part of the same signal transduction pathway via compartmentalization, scaffold, or anchor proteins is an essential aspect of the signal transduction system in eukaryotic cells. If interaction must occur via free diffusion, then the spatial separation between the sources of the two interacting proteins and their degradation rates become primary determinants of the time required for interaction. To understand the role of such colocalization, we create a mathematical model of the diffusion based protein-protein interaction process. We assume that mRNAs, which serve as the sources of these proteins, are located at different positions in the cytoplasm. For large cells such as Drosophila oocytes we show that if the source mRNAs were at random locations in the cell rather than colocalized, the average rate of interactions would be extremely small, which suggests that localization is needed to facilitate protein interactions and not just to prevent cross-talk between different signaling modules.
View details for DOI 10.1073/pnas.0401314101
View details for Web of Science ID 000221107900023
View details for PubMedID 15096590
View details for PubMedCentralID PMC404064
-
Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
2004; 66: 187-205
View details for Web of Science ID 000187448400012
-
Gene expression patterns and gene copy number changes in dermatofibrosarcoma protuberans
AMERICAN JOURNAL OF PATHOLOGY
2003; 163 (6): 2383-2395
Abstract
Dermatofibrosarcoma protuberans (DFSP) is an aggressive spindle cell neoplasm. It is associated with the chromosomal translocation, t(17:22), which fuses the COL1A1 and PDGFbeta genes. We determined the characteristic gene expression profile of DFSP and characterized DNA copy number changes in DFSP by array-based comparative genomic hybridization (array CGH). Fresh frozen and formalin-fixed, paraffin-embedded samples of DFSP were analyzed by array CGH (four cases) and DNA microarray analysis of global gene expression (nine cases). The nine DFSPs were readily distinguished from 27 other diverse soft tissue tumors based on their gene expression patterns. Genes characteristically expressed in the DFSPs included PDGF beta and its receptor, PDGFRB, APOD, MEOX1, PLA2R, and PRKCA. Array CGH of DNA extracted either from frozen tumor samples or from paraffin blocks yielded equivalent results. Large areas of chromosomes 17q and 22q, bounded by COL1A1 and PDGF beta, respectively, were amplified in DFSP. Expression of genes in the amplified regions was significantly elevated. Our data shows that: 1) DFSP has a distinctive gene expression profile; 2) array CGH can be applied successfully to frozen or formalin-fixed, paraffin-embedded tumor samples; 3) a characteristic amplification of sequences from chromosomes 17q and 22q, demarcated by the COL1A1 and PDGF beta genes, respectively, was associated with elevated expression of the amplified genes.
View details for PubMedID 14633610
-
Rotation space random fields with an application to fMRI data
ANNALS OF STATISTICS
2003; 31 (6): 1732-1771
View details for Web of Science ID 000188780400002
-
Statistical analysis of direct identity-by-descent mapping
ANNALS OF HUMAN GENETICS
2003; 67: 464-470
Abstract
Genetic mismatch scanning has been suggested as a method for using affected pairs of ostensibly unrelated but putatively distantly related affecteds in isolated populations to map disease genes. We model the regions of identity-by-descent of these affected pairs as a continuous time two state process with unknown parameters that depend on the (unknown) relationships, and we estimate the unknown parameters from the observed data. Simulated data involving pairs of first to fourth cousins show that the procedure thus obtained has properties similar, albeit slightly inferior, to the case where the relationships of the affected pairs, hence the parameters governing the processes, are known.
View details for Web of Science ID 000185326600007
View details for PubMedID 12940919
-
Upward bias in estimation of genetic effects
AMERICAN JOURNAL OF HUMAN GENETICS
2002; 71 (5): 1183-1188
Abstract
Because of the large number of tests for linkage that are performed in genome scans, the naive estimator of the size of a genetic effect in cases of borderline significance can be inflated and lead to unrealistic expectations for successful replication. As a remedy, this report proposes lower confidence limits that account for the multiple comparisons of the genome scan.
View details for Web of Science ID 000178884300016
View details for PubMedID 12386837
View details for PubMedCentralID PMC385094
-
Frequentist estimation of coalescence times from nucleotide sequence data using a tree-based partition
GENETICS
2002; 161 (1): 447-459
Abstract
This article proposes a method of estimating the time to the most recent common ancestor (TMRCA) of a sample of DNA sequences. The method is based on the molecular clock hypothesis, but avoids assumptions about population structure. Simulations show that in a wide range of situations, the point estimate has small bias and the confidence interval has at least the nominal coverage probability. We discuss conditions that can lead to biased estimates. Performance of this estimator is compared with existing methods based on the coalescence theory. The method is applied to sequences of Y chromosomes and mtDNAs to estimate the coalescent times of human male and female populations.
View details for Web of Science ID 000175814900040
View details for PubMedID 12019257
-
Mapping multiple genes for quantitative or complex traits
GENETIC EPIDEMIOLOGY
2002; 22 (4): 313-327
Abstract
Models for complex and quantitative traits that involve multiple, possibly interacting, genes are described. Methods of linkage analysis are developed that utilize special features of these models, and their power is compared with that of simple genome scans that ignore these special features. Our calculations show that for family-based nonparametric linkage analysis in human genetics, in contrast to experimental genetics, there are limits to the increase in power that can be achieved by correctly modeling gene-gene interactions. In particular, the noncentrality parameter of likelihood-based statistics to detect single gene effects involves both single gene and interaction components of variance, so even when the interaction components of variance are relatively large, the incremental power from a statistic designed to detect both single gene and interaction effects is often quite modest. We carry out our analysis with the assistance of a parameterization that allows us to compute score statistics, noncentrality parameters, and Fisher information matrices reasonably explicitly.
View details for DOI 10.1002/gepi.01108
View details for Web of Science ID 000175413700004
View details for PubMedID 11984864
-
Mapping quantitative trait loci in oligogenic models.
Biostatistics
2001; 2 (2): 147-162
Abstract
We discuss strategies for mapping quantitative trait loci with emphasis on certain issues of study design that have recently received attention: e.g. genotyping only selected pedigrees and the comparative value of large pedigrees versus sib pairs. We use a standard variance components model and a parametrization of the genetic effects in which the 'segregation' parameters are locally orthogonal to the 'linkage' parameters. This permits simple explicit expressions for the expectation of the score statistic, which we use to compare the power of different strategies. We also discuss robustness of the score statistic.
View details for PubMedID 12933546
-
Is peak height sufficient?
GENETIC EPIDEMIOLOGY
2001; 20 (4): 403-408
Abstract
The suggestion that more power can be obtained from a genome scan by consideration of "peak width" in addition to "peak height" has been controversial. Regarding this question from the viewpoint of smoothing, one finds that to the extent that smoothing increases the informativeness of individual markers it is possible to obtain increased power; but for markers that are fully informative the value of smoothing is questionable.
View details for Web of Science ID 000168316300001
View details for PubMedID 11319781
-
Approximate p-values for local sequence alignments: Numerical studies
JOURNAL OF COMPUTATIONAL BIOLOGY
2001; 8 (5): 549-556
Abstract
Siegmund and Yakir (2000) have given an approximate p-value when two independent, identically distributed sequences from a finite alphabet are optimally aligned based on a scoring system that rewards similarities according to a general scoring matrix and penalizes gaps (insertions and deletions). The approximation involves an infinite sequence of difficult-to-compute parameters. In this paper, it is shown by numerical studies that these reduce to essentially two numerically distinct parameters, which can be computed as one-dimensional numerical integrals. For an arbitrary scoring matrix and affine gap penalty, this modified approximation is easily evaluated. Comparison with published numerical results show that it is reasonably accurate.
View details for Web of Science ID 000171950200006
View details for PubMedID 11694182
-
Note on a stochastic recursion
Symposium on State of the Art in Probability and Statistics: Festschrift for Willem R VanZwet
INST MATHEMATICAL STATISTICS. 2001: 547–554
View details for Web of Science ID 000175458600029
-
Approximate p-values for local sequence alignments
ANNALS OF STATISTICS
2000; 28 (3): 657-680
View details for Web of Science ID 000165456000001
-
Tail probabilities for the null distribution of scanning statistics
BERNOULLI
2000; 6 (2): 191-213
View details for Web of Science ID 000086645400001
-
The maximum of a function of a Markov chain and application to linkage analysis
ADVANCES IN APPLIED PROBABILITY
1999; 31 (2): 510-531
View details for Web of Science ID 000083438800011
-
Statistical methods for mapping quantitative trait loci from a dense set of markers
GENETICS
1999; 151 (1): 373-386
Abstract
Lander and Botstein introduced statistical methods for searching an entire genome for quantitative trait loci (QTL) in experimental organisms, with emphasis on a backcross design and QTL having only additive effects. We extend their results to intercross and other designs, and we compare the power of the resulting test as a function of the magnitude of the additive and dominance effects, the sample size and intermarker distances. We also compare three methods for constructing confidence regions for a QTL: likelihood regions, Bayesian credible sets, and support regions. We show that with an appropriate evaluation of the coverage probability a support region is approximately a confidence region, and we provide a theroretical explanation of the empirical observation that the size of the support region is proportional to the sample size, not the square root of the sample size, as one might expect from standard statistical theory.
View details for Web of Science ID 000078028900034
View details for PubMedID 9872974
View details for PubMedCentralID PMC1460471
-
Multipoint linkage analysis using affected relative pairs and partially informative markers
BIOMETRICS
1998; 54 (4): 1247-1265
Abstract
Linkage analysis is a method of identifying regions of the human genome harboring genes affecting the risk for a particular disease. It works by finding chromosomal segments inherited by affected relatives from a common ancestor (i.e., identical by descent or IBD) in excess of that expected by chance. Two complicating factors are that only a relatively small number of genomic locations (marker loci) are examined and the number of distinct realizations (alleles) at each marker is not large. Hence, unambiguous determination of IBD is impossible for any genomic location without additional information. Assuming data from a set of mapped, partially informative markers, we evaluate the effectiveness of a method that analyzes the array of markers on each chromosome jointly (multipoint methods) as a function of the informativeness and density of the markers. For the special case of pairs of half siblings whose parents are also typed, a combination of analysis and simulation is used to obtain insight into the problem of setting thresholds to control the false-positive error rate. Approximations are given for the power, and guidelines are developed to help describe the trade-offs between marker density and informativeness.
View details for Web of Science ID 000077898700004
View details for PubMedID 9883537
-
Combining information within and between pedigrees for mapping complex traits
AMERICAN JOURNAL OF HUMAN GENETICS
1997; 60 (4): 979-992
Abstract
This paper is concerned with efficient strategies for gene mapping using pedigrees containing small numbers of affecteds and identity-by-descent data from closely spaced markers throughout the genome. Particular attention is paid to additive traits involving phenocopies and/or locus heterogeneity. For a sample of pedigrees containing a particular configuration of affecteds, e.g., pairs of siblings together with a first cousin, we use a likelihood analysis to find 1-df statistics that are very efficient over a broad range of penetrances and allele frequencies. We identify configurations of affecteds that are particularly powerful for detecting linkage, and we show how pedigrees containing different numbers and configurations of affecteds can be efficiently combined in an overall test statistic.
View details for Web of Science ID A1997WT61400028
View details for PubMedID 9106545
-
Strategies for mapping heterogeneous recessive traits by allele-sharing methods
AMERICAN JOURNAL OF HUMAN GENETICS
1997; 60 (4): 965-978
Abstract
We investigate strategies for detecting linkage of recessive and partially recessive traits, using sibling pairs and inbred individuals. We assume that a genomewide search is being conducted and that locus heterogeneity of the trait is likely. For sibling pairs, we evaluate the efficiency of different statistics under the assumption that one does not know the true degree of recessiveness of the trait. We recommend a sibling-pair statistic that is a linear compromise between two previously suggested statistics. We also compare the power of sibling pairs to that of more distant relatives, such as cousins. For inbred individuals, we evaluate the power of offspring of different types of matings and compare them to sibling pairs. Over a broad range of trait etiologies, sibling pairs are more powerful than inbred individuals, but for traits caused by very rare alleles, particularly in the case of heterogeneity, inbred individuals can be much more powerful. The models we develop can also be used to examine specific situations other than those we look at. We present this analysis in the idealized context of a dense set of highly polymorphic markers. In general, incorporation of real-world complexities makes inbred individuals, particularly offspring of distant relatives, look slightly less useful than our results imply.
View details for Web of Science ID A1997WT61400027
View details for PubMedID 9106544
View details for PubMedCentralID PMC1712456
-
The approximate distribution of the maximum of a smoothed Poisson random field
STATISTICA SINICA
1997; 7 (1): 167-180
View details for Web of Science ID A1997WF57100011
-
STATISTICAL-METHODS FOR LINKAGE ANALYSIS OF COMPLEX TRAITS FROM HIGH-RESOLUTION MAPS OF IDENTITY BY DESCENT
GENETICS
1995; 140 (2): 843-856
Abstract
A multilocus model for complex traits is described that generalizes the additive and multiplicative models and hence allows simultaneously for both heterogeneity and gene interaction (epistasis). Statistical methods of linkage analysis are discussed under the assumption that identity by descent data from a dense set of polymorphic markers are available. Three methods, single locus search, simultaneous search and conditional search, are described and compared.
View details for Web of Science ID A1995RA36600035
View details for PubMedID 7498758
View details for PubMedCentralID PMC1206656
-
TESTING FOR A SIGNAL WITH UNKNOWN LOCATION AND SCALE IN A STATIONARY GAUSSIAN RANDOM-FIELD
ANNALS OF STATISTICS
1995; 23 (2): 608-639
View details for Web of Science ID A1995RH23800016
-
USING THE GENERALIZED LIKELIHOOD RATIO STATISTIC FOR SEQUENTIAL DETECTION OF A CHANGE-POINT
ANNALS OF STATISTICS
1995; 23 (1): 255-271
View details for Web of Science ID A1995RE61100016
-
Confidence regions in broken line regression
AMS/IMS/SIAM Summer Research Conference on Change-Point Problems
INST MATHEMATICAL STATISTICS. 1994: 292–316
View details for Web of Science ID A1994BF61K00023
-
THE EXPECTED NUMBER OF LOCAL MAXIMA OF A RANDOM-FIELD AND THE VOLUME OF TUBES
ANNALS OF STATISTICS
1993; 21 (4): 1948-1966
View details for Web of Science ID A1993MQ47700013
-
GAUSSIAN MODELS FOR GENETIC-LINKAGE ANALYSIS USING COMPLETE HIGH-RESOLUTION MAPS OF IDENTITY BY DESCENT
AMERICAN JOURNAL OF HUMAN GENETICS
1993; 53 (1): 234-251
Abstract
Gaussian-process models are developed to detect genetic linkage using complete high-resolution maps of identity by descent between affected relative pairs. Approximations are given for the significance level and power of the likelihood-ratio test of no linkage and for likelihood-ratio confidence regions for trait loci. The sample sizes required to detect linkage by using different classes of affected relative pairs are compared, and the problem of combining data from different classes of relatives is discussed.
View details for Web of Science ID A1993LJ38500027
View details for PubMedID 8317489
View details for PubMedCentralID PMC1682227
-
A SEQUENTIAL CLINICAL-TRIAL FOR COMPARING 3 TREATMENTS
ANNALS OF STATISTICS
1993; 21 (1): 464-483
View details for Web of Science ID A1993LB38900026
-
ASYMPTOTIC APPROXIMATIONS FOR LIKELIHOOD RATIO TESTS AND CONFIDENCE-REGIONS FOR A CHANGE-POINT IN THE MEAN OF A MULTIVARIATE NORMAL-DISTRIBUTION
STATISTICA SINICA
1992; 2 (1): 69-90
View details for Web of Science ID A1992HC09200004
-
SEQUENTIAL DETECTION OF A CHANGE IN A NORMAL-MEAN WHEN THE INITIAL-VALUE IS UNKNOWN
ANNALS OF STATISTICS
1991; 19 (1): 394-416
View details for Web of Science ID A1991FF04700028
-
CONFIDENCE-REGIONS IN SEMILINEAR REGRESSION
BIOMETRIKA
1991; 78 (1): 15-31
View details for Web of Science ID A1991FD52300003
-
ON HOTELLINGS APPROACH TO TESTING FOR A NONLINEAR PARAMETER IN REGRESSION
INTERNATIONAL STATISTICAL REVIEW
1989; 57 (3): 205-220
View details for Web of Science ID A1989CE41900002
-
THE LIKELIHOOD RATIO TEST FOR A CHANGE-POINT IN SIMPLE LINEAR-REGRESSION
BIOMETRIKA
1989; 76 (3): 409-423
View details for Web of Science ID A1989AR09100001
-
APPROXIMATE EXIT PROBABILITIES FOR A BROWNIAN BRIDGE ON A SHORT-TIME INTERVAL, AND APPLICATIONS
ADVANCES IN APPLIED PROBABILITY
1989; 21 (1): 1-19
View details for Web of Science ID A1989T648700001
-
ON HOTELLING FORMULA FOR THE VOLUME OF TUBES AND NAIMAN INEQUALITY
ANNALS OF STATISTICS
1989; 17 (1): 184-194
View details for Web of Science ID A1989T614100009
-
CONFIDENCE SETS IN CHANGE-POINT PROBLEMS
INTERNATIONAL STATISTICAL REVIEW
1988; 56 (1): 31-48
View details for Web of Science ID A1988M962700003
-
APPROXIMATE TAIL PROBABILITIES FOR THE MAXIMA OF SOME RANDOM-FIELDS
ANNALS OF PROBABILITY
1988; 16 (2): 487-501
View details for Web of Science ID A1988M945700003
-
CONDITIONAL BOUNDARY CROSSING PROBABILITIES, WITH APPLICATIONS TO CHANGE-POINT PROBLEMS
ANNALS OF PROBABILITY
1988; 16 (2): 825-839
View details for Web of Science ID A1988M945700023
-
BOUNDARY CROSSING PROBABILITIES AND STATISTICAL APPLICATIONS
ANNALS OF STATISTICS
1986; 14 (2): 361-404
View details for Web of Science ID A1986D171100001
-
CONVERGENCE OF QUASI-STATIONARY TO STATIONARY DISTRIBUTIONS FOR STOCHASTICALLY MONOTONE MARKOV-PROCESSES
JOURNAL OF APPLIED PROBABILITY
1986; 23 (1): 215-220
View details for Web of Science ID A1986C400600020
-
LARGE DEVIATIONS FOR THE MAXIMA OF SOME RANDOM-FIELDS
ADVANCES IN APPLIED MATHEMATICS
1986; 7 (1): 2-22
View details for Web of Science ID A1986A881100002
-
A DIFFUSION PROCESS AND ITS APPLICATIONS TO DETECTING A CHANGE IN THE DRIFT OF BROWNIAN-MOTION
BIOMETRIKA
1985; 72 (2): 267-280
View details for Web of Science ID A1985AMZ4700004
-
FIXED ACCURACY ESTIMATION OF AN AUTOREGRESSIVE PARAMETER
ANNALS OF STATISTICS
1983; 11 (2): 478-485
View details for Web of Science ID A1983QT81500010
-
SEQUENTIAL-ANALYSIS OF THE PROPORTIONAL HAZARDS MODEL
BIOMETRIKA
1983; 70 (2): 315-326
View details for Web of Science ID A1983RA65800002
-
CONTINUOUS INTRAVENOUS VASOPRESSIN IN ACTIVE UPPER GASTROINTESTINAL-BLEEDING - A PLACEBO-CONTROLLED TRIAL
ANNALS OF INTERNAL MEDICINE
1982; 96 (5): 565-569
Abstract
Sixty patients with active upper gastrointestinal bleeding were randomized to received either continuous intravenous infusions of vasopressin (29 patients) or placebo (31 patients) at a rate of 40 U/h. Six hours after beginning the study, 13 patients in the vasopressin group and 11 in the placebo group] had ceased bleeding (p = 0.46). By 24 hours. 17 patients in the vasopressin group and 14 in the placebo group had stopped bleeding (p = 0.30). Restriction of the analysis to patients bleeding from varices showed no advantage with vasopressin treatment after 6 or 24 hours. No consistent trend favoring use of vasopressin to stop hemorrhage was noted during the 30-month study period. There was little difference between the two groups in the number of patients needing surgery (13 on vasopressin, 18 on placebo; p = 0.30) or the number of deaths (eight on vasopressin, 11 on placebo; p = 0.51); the transfusion requirement was the same. In our patients, a continuous intravenous infusion of vasopressin neither controlled bleeding nor altered outcome.
View details for Web of Science ID A1982NP94900004
View details for PubMedID 7041728
-
BROWNIAN APPROXIMATIONS TO 1ST PASSAGE PROBABILITIES
ZEITSCHRIFT FUR WAHRSCHEINLICHKEITSTHEORIE UND VERWANDTE GEBIETE
1982; 59 (2): 239-248
View details for Web of Science ID A1982NJ23400009
-
LARGE DEVIATIONS FOR BOUNDARY CROSSING PROBABILITIES
ANNALS OF PROBABILITY
1982; 10 (3): 581-588
View details for Web of Science ID A1982PC77000005
-
MAXIMALLY SELECTED CHI-SQUARE STATISTICS
BIOMETRICS
1982; 38 (4): 1011-1016
View details for Web of Science ID A1982QF89700013
-
A SEQUENTIAL CLINICAL-TRIAL FOR TESTING P1=P2
ANNALS OF STATISTICS
1980; 8 (6): 1219-1228
View details for Web of Science ID A1980KT80600003
-
SEQUENTIAL X2 AND F-TESTS AND THE RELATED CONFIDENCE-INTERVALS
BIOMETRIKA
1980; 67 (2): 389-402
View details for Web of Science ID A1980KA99800013
-
NON-LINEAR RENEWAL THEORY WITH APPLICATIONS TO SEQUENTIAL-ANALYSIS .2.
ANNALS OF STATISTICS
1979; 7 (1): 60-76
View details for Web of Science ID A1979GL00300004
-
CORRECTED DIFFUSION APPROXIMATIONS IN CERTAIN RANDOM-WALK PROBLEMS
ADVANCES IN APPLIED PROBABILITY
1979; 11 (4): 701-719
View details for Web of Science ID A1979HV96600002
-
ESTIMATION FOLLOWING SEQUENTIAL TESTS
BIOMETRIKA
1978; 65 (2): 341-349
View details for Web of Science ID A1978FL40100012
-
NONLINEAR RENEWAL THEORY WITH APPLICATIONS TO SEQUENTIAL-ANALYSIS I
ANNALS OF STATISTICS
1977; 5 (5): 946-954
View details for Web of Science ID A1977DV19800010
-
REPEATED SIGNIFICANCE TESTS FOR A NORMAL MEAN
BIOMETRIKA
1977; 64 (2): 177-189
View details for Web of Science ID A1977DQ65700001
-
EQUIVALENCE OF ABSORBING AND REFLECTING BARRIER PROBLEMS FOR STOCHASTICALLY MONOTONE MARKOV-PROCESSES
ANNALS OF PROBABILITY
1976; 4 (6): 914-924
View details for Web of Science ID A1976CN99500004
-
PROBABILITY DISTRIBUTIONS RELATED TO LAW OF ITERATED LOGARITHM
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
1969; 62 (1): 11-?
Abstract
Let W(t) denote a standard Wiener process for 0 = t < infinity. We compute the probability that W(t) >/= t((1/2))A(t) for some t >/= 1 (or for some t >/= 0) for a certain class of functions A(t), including functions which are approximately (2 log log t)((1/2)) as t --> infinity. We also give an invariance principle which states that this probability is the limit as m --> infinity of the probability that s(n) >/= n((1/2))A(n/m) for some n >/= m (or for some n >/= 1), where s(n) is the sum of n independent and identically distributed random variables with mean 0 and variance 1.
View details for Web of Science ID A1969C892200002
View details for PubMedID 16591726
View details for PubMedCentralID PMC285947