Academic Appointments


2021-22 Courses


Stanford Advisees


All Publications


  • SEGMENTATION AND ESTIMATION OF CHANGE-POINT MODELS: FALSE POSITIVE CONTROL AND CONFIDENCE REGIONS ANNALS OF STATISTICS Fang, X., Li, J., Siegmund, D. 2020; 48 (3): 1615–47

    View details for DOI 10.1214/19-AOS1861

    View details for Web of Science ID 000551644000017

  • A genome-wide approach for detecting novel insertion-deletion variants of mid-range size. Nucleic acids research Xia, L. C., Sakshuwong, S., Hopmans, E. S., Bell, J. M., Grimes, S. M., Siegmund, D. O., Ji, H. P., Zhang, N. R. 2016; 44 (15)

    Abstract

    We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.

    View details for DOI 10.1093/nar/gkw481

    View details for PubMedID 27325742

    View details for PubMedCentralID PMC5009736

  • POISSON APPROXIMATION FOR TWO SCAN STATISTICS WITH RATES OF CONVERGENCE ANNALS OF APPLIED PROBABILITY Fang, X., Siegmund, D. 2016; 26 (4): 2384-2418

    View details for DOI 10.1214/15-AAP1150

    View details for Web of Science ID 000383411200014

  • SCAN STATISTICS ON POISSON RANDOM FIELDS WITH APPLICATIONS IN GENOMICS ANNALS OF APPLIED STATISTICS Zhang, N. R., Yakir, B., Xia, L. C., Siegmund, D. 2016; 10 (2): 726-755

    View details for DOI 10.1214/15-AOAS892

    View details for Web of Science ID 000385029700008

  • HIGHER CRITICISM: p-VALUES AND CRITICISM ANNALS OF STATISTICS Li, J., Siegmund, D. 2015; 43 (3): 1323-1350

    View details for DOI 10.1214/15-AOS1312

    View details for Web of Science ID 000355768700013

  • SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION ANNALS OF STATISTICS Xie, Y., Siegmund, D. 2013; 41 (2): 670-692

    View details for DOI 10.1214/13-AOS1094

    View details for Web of Science ID 000320488200010

  • Change-Points: From Sequential Detection to Biology and Back SEQUENTIAL ANALYSIS-DESIGN METHODS AND APPLICATIONS Siegmund, D. 2013; 32 (1): 2-14
  • SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION Information Theory and Applications Workshop Xie, Y., Siegmund, D. IEEE. 2013
  • MODEL SELECTION FOR HIGH-DIMENSIONAL, MULTI-SEQUENCE CHANGE-POINT PROBLEMS STATISTICA SINICA Zhang, N. R., Siegmund, D. O. 2012; 22 (4): 1507-1538
  • Spectrum Opportunity Detection with Weak and Correlated Signals 46th Asilomar Conference on Signals, Systems and Computers Xie, Y., Siegmund, D. IEEE. 2012: 128–132
  • False discovery rate for scanning statistics BIOMETRIKA Siegmund, D. O., Zhang, N. R., Yakir, B. 2011; 98 (4): 979-985
  • DETECTING SIMULTANEOUS VARIANT INTERVALS IN ALIGNED SEQUENCES ANNALS OF APPLIED STATISTICS Siegmund, D., Yakir, B., Zhang, N. R. 2011; 5 (2A): 645-668

    View details for DOI 10.1214/10-AOAS400

    View details for Web of Science ID 000295453300003

  • Joint Testing of Genotype and Ancestry Association in Admixed Families GENETIC EPIDEMIOLOGY Tang, H., Siegmund, D. O., Johnson, N. A., Romieu, I., London, S. J. 2010; 34 (8): 783-791

    Abstract

    Current genome-wide association studies (GWAS) often involve populations that have experienced recent genetic admixture. Genotype data generated from these studies can be used to test for association directly, as in a non-admixed population. As an alternative, these data can be used to infer chromosomal ancestry, and thus allow for admixture mapping. We quantify the contribution of allele-based and ancestry-based association testing under a family-design, and demonstrate that the two tests can provide non-redundant information. We propose a joint testing procedure, which efficiently integrates the two sources information. The efficiencies of the allele, ancestry and combined tests are compared in the context of a GWAS. We discuss the impact of population history and provide guidelines for future design and analysis of GWAS in admixed populations.

    View details for DOI 10.1002/gepi.20520

    View details for Web of Science ID 000284719100002

    View details for PubMedID 21031451

    View details for PubMedCentralID PMC3103820

  • Detecting simultaneous changepoints in multiple sequences BIOMETRIKA Zhang, N. R., Siegmund, D. O., Ji, H., Li, J. Z. 2010; 97 (3): 631-645

    Abstract

    We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.

    View details for DOI 10.1093/biomet/asq025

    View details for Web of Science ID 000280904000008

    View details for PubMedCentralID PMC3372242

  • Detecting simultaneous changepoints in multiple sequences. Biometrika Zhang, N. R., Siegmund, D. O., Ji, H. n., Li, J. Z. 2010; 97 (3): 631–45

    Abstract

    We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.

    View details for PubMedID 22822250

  • Mapping Quantitative Traits in Unselected Families: Algorithms and Examples GENETIC EPIDEMIOLOGY Dupuis, J., Shi, J., Manning, A. K., Benjamin, E. J., Meigs, J. B., Cupples, L. A., Siegmund, D. 2009; 33 (7): 617-627

    Abstract

    Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic, which in contrast to the likelihood ratio statistic can use nonparametric estimators of variability to achieve robustness of the false-positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity by descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study.

    View details for DOI 10.1002/gepi.20413

    View details for Web of Science ID 000271406100006

    View details for PubMedID 19278016

    View details for PubMedCentralID PMC2766029

  • Minimax optimality of the Shiryayev-Roberts change-point detection rule JOURNAL OF STATISTICAL PLANNING AND INFERENCE Siegmund, D. O., Yakir, B. 2008; 138 (9): 2815-2825
  • The distribution of maxima of approximately Gaussian random fields ANNALS OF STATISTICS Nardi, Y., Siegmund, D. O., Yakir, B. 2008; 36 (3): 1375-1403

    View details for DOI 10.1214/07-AOS511

    View details for Web of Science ID 000256504400014

  • Detecting the emergence of a signal in a noisy image STATISTICS AND ITS INTERFACE Siegmund, D., Yakir, B. 2008; 1 (1): 3-12
  • A unified framework for linkage and association analysis of quantitative traits PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Dupuis, J., Siegmund, D. O., Yakir, B. 2007; 104 (51): 20210-20215

    Abstract

    We give a unified treatment of the statistical foundations of population based association mapping and of family based linkage mapping of quantitative traits in humans. A central ingredient in the unification involves the efficient score statistic. The discussion focuses on generalized linear models with an additional illustration of the Cox (proportional hazards) model for age of onset data. We give analytic expressions for noncentrality parameters and show how they give qualitative insight into the loss of power that occurs if the scientist's assumed genetic model differs from nature's "true" genetic model. Issues to be studied in detail in the future development of this approach are discussed.

    View details for DOI 10.1073/pnas.0707138105

    View details for Web of Science ID 000251885000013

    View details for PubMedID 18077372

    View details for PubMedCentralID PMC2154410

  • Importance sampling for estimating p values in linkage analysis JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Shi, J., Siegmund, D., Yakir, B. 2007; 102 (479): 929-937
  • A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data BIOMETRICS Zhang, N. R., Siegmund, D. O. 2007; 63 (1): 22-32

    Abstract

    In the analysis of data generated by change-point processes, one critical challenge is to determine the number of change-points. The classic Bayes information criterion (BIC) statistic does not work well here because of irregularities in the likelihood function. By asymptotic approximation of the Bayes factor, we derive a modified BIC for the model of Brownian motion with changing drift. The modified BIC is similar to the classic BIC in the sense that the first term consists of the log likelihood, but it differs in the terms that penalize for model dimension. As an example of application, this new statistic is used to analyze array-based comparative genomic hybridization (array-CGH) data. Array-CGH measures the number of chromosome copies at each genome location of a cell sample, and is useful for finding the regions of genome deletion and amplification in tumor cells. The modified BIC performs well compared to existing methods in accurately choosing the number of regions of changed copy number. Unlike existing methods, it does not rely on tuning parameters or intensive computing. Thus it is impartial and easier to understand and to use.

    View details for DOI 10.1111/j.1541-0420.2006.00662.x

    View details for Web of Science ID 000244647100003

    View details for PubMedID 17447926

  • Statistical corrections of linkage data suggest predominantly cis regulations of gene expression. BMC proceedings Shi, J., Siegmund, D. O., Levinson, D. F. 2007; 1: S145-?

    Abstract

    Morley et al. (Nature 2004, 430:743-747) detected significant linkages to the expression levels of 142 genes (of 3554) at a reported threshold of genome-wide p = 0.001 (LOD asymptotically equal to 5.3), using 14 three-generation Centre d'Etude du Polymorphisme Humain pedigrees. Most of the linkages (77%) were trans, i.e., more than 5 Mb from the expressed gene. However, the analysis did not account for the expected anti-conservative effect of the skewed distribution of score- or regression-based statistics in large sibships, or for the possible variance distortion due to correlations among tests. Therefore, we re-analyzed their data, using a robust score statistic for the entire pedigrees and correcting the p-values for skewness. We found that a LOD of 5.3 had a skewness-corrected genome-wide p-value of 0.016 instead of 0.001 (a result that we confirmed using simulation), with around 50 expected false positives. We then further corrected for correlation among the (skew-corrected) p-values by using Efron's method for obtaining the empirical null distribution. Setting a threshold of FDR = 10% (Z = 6.4, LOD = 8.9), we detected linkage for the expression levels of 22 genes, 19 of which are cis. Limiting the analysis to cis regions, linkage was detected to the expression levels of 46 genes with 4.6 expected false positives (FDR = 10%).

    View details for PubMedID 18466489

  • Approximating the variance of the conditional probability of the state of a hidden Markov model STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY Siegmund, D. O., Yakir, B. 2007; 6

    Abstract

    In a hidden Markov model, one "estimates" the state of the hidden Markov chain at t by computing via the forwards-backwards algorithm the conditional distribution of the state vector given the observed data. The covariance matrix of this conditional distribution measures the information lost by failure to observe directly the state of the hidden process. In the case where changes of state occur slowly relative to the speed at which information about the underlying state accumulates in the observed data, we compute approximately these covariances in terms of functionals of Brownian motion that arise in change-point analysis. Applications in gene mapping, where these covariances play a role in standardizing the score statistic and in evaluating the loss of noncentrality due to incomplete information, are discussed. Numerical examples illustrate the range of validity and limitations of our results.

    View details for Web of Science ID 000252387000001

    View details for PubMedID 17672820

  • QTL mapping under ascertainment ANNALS OF HUMAN GENETICS Peng, J., Siegmund, D. 2006; 70: 867-881

    Abstract

    Mapping quantitative trait loci (QTL) using ascertained sibships is discussed. It is shown that under the standard normality assumption of variance components analysis the efficient scores are unchanged by ascertainment, and two different schemes of ascertainment correction suggested in the literature are asymptotically equivalent. The use of conditional maximum likelihood estimators derived under the normality assumption to estimate nuisance parameters is shown to result in only a small loss of power compared to the case of known parameters, even when the distribution of phenotypes is non-normal and/or the ascertainment criterion is ill defined.

    View details for DOI 10.1111/j.1469-1809.2006.00286.x

    View details for Web of Science ID 000241191400019

    View details for PubMedID 17044862

  • Spatial regulation and the rate of signal transduction activation PLOS COMPUTATIONAL BIOLOGY Batada, N. N., Shepp, L. A., Siegmund, D. O., Levitt, M. 2006; 2 (5): 343-349

    Abstract

    Of the many important signaling events that take place on the surface of a mammalian cell, activation of signal transduction pathways via interactions of cell surface receptors is one of the most important. Evidence suggests that cell surface proteins are not as freely diffusible as implied by the classic fluid mosaic model and that their confinement to membrane domains is regulated. It is unknown whether these dynamic localization mechanisms function to enhance signal transduction activation rate or to minimize cross talk among pathways that share common intermediates. To determine which of these two possibilities is more likely, we derive an explicit equation for the rate at which cell surface membrane proteins interact based on a Brownian motion model in the presence of endocytosis and exocytosis. We find that in the absence of any diffusion constraints, cell surface protein interaction rate is extremely high relative to cytoplasmic protein interaction rate even in a large mammalian cell with a receptor abundance of a mere two hundred molecules. Since a larger number of downstream signaling events needs to take place, each occurring at a much slower rate than the initial activation via association of cell surface proteins, we conclude that the role of co-localization is most likely that of cross-talk reduction rather than coupling efficiency enhancement.

    View details for DOI 10.1371/journal.pcbi.0020044

    View details for Web of Science ID 000239493900003

    View details for PubMedID 16699596

    View details for PubMedCentralID PMC1458967

  • Genome scans with gene-covariate interaction GENETIC EPIDEMIOLOGY Peng, J., Tang, H. K., Siegmund, D. 2005; 29 (3): 173-184

    Abstract

    Genetic models for gene-covariate interactions are described. Methods of linkage analysis that utilize special features of these models and the corresponding score statistics are derived. Their power is compared with that of simple genome scans that ignore these special features, and substantial gains in power are observed when the gene-covariate interaction is strong. Quantitative trait mapping in randomly ascertained sibships and affected sibpair mapping are discussed. For the latter case, a simpler statistic is proposed that has similar performance to the score statistic, but does not require the estimation of nuisance parameters. Since the nuisance parameters are not estimable solely from affected sib-pair data, this statistic would be much easier to apply in practice. Similarities with linkage analysis of models for longitudinal data and multivariate phenotypes are also briefly discussed. Approximations for the P-value and power are derived under the framework of local alternatives.

    View details for DOI 10.1002/gepi.20100

    View details for Web of Science ID 000233059200001

    View details for PubMedID 16216012

  • An urn model of Diaconis ANNALS OF PROBABILITY Siegmund, D., Yakir, B. 2005; 33 (5): 2036-2042
  • On the power for linkage detection using a test based on scan statistics BIOSTATISTICS Hernandez, S., Siegmund, D. O., de Gunst, M. 2005; 6 (2): 259-269

    Abstract

    We analyze some aspects of scan statistics, which have been proposed to help for the detection of weak signals in genetic linkage analysis. We derive approximate expressions for the power of a test based on moving averages of the identity by descent allele sharing proportions for pairs of relatives at several contiguous markers. We confirm these approximate formulae by simulation. The results show that when there is a single trait-locus on a chromosome, the test based on the scan statistic is slightly less powerful than that based on the customary allele sharing statistic. On the other hand, if two genes having a moderate effect on a trait lie close to each other on the same chromosome, scan statistics improve power to detect linkage.

    View details for Web of Science ID 000228428200007

    View details for PubMedID 15772104

  • The admixture model in linkage analysis JOURNAL OF STATISTICAL PLANNING AND INFERENCE Peng, J., Siegmund, D. 2005; 130 (1-2): 317-324
  • Model selection in irregular problems: Applications to mapping quantitative trait loci BIOMETRIKA Siegmund, D. 2004; 91 (4): 785-800
  • A report on the future of statistics STATISTICAL SCIENCE Lindsay, B. G., Kettenring, J., Siegmund, D. O. 2004; 19 (3): 387-407
  • Mapping quantitative traits with random and with ascertained sibships PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Peng, J., Siegmund, D. 2004; 101 (21): 7845-7850

    Abstract

    Use of a robust score statistic based on a variance components model to map quantitative trait loci in randomly sampled pedigrees is reviewed. Sibships ascertained through a single proband are discussed. Under a standard assumption of multivariate normality, two suggested methods of ascertainment correction are shown to be asymptotically equivalent when the number of sibships is large.

    View details for DOI 10.1073/pnas.0401713101

    View details for Web of Science ID 000221652000003

    View details for PubMedID 15084737

    View details for PubMedCentralID PMC419519

  • Stochastic model of protein-protein interaction: Why signaling proteins need to be colocalized PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Batada, N. N., SHEPP, L. A., Siegmund, D. O. 2004; 101 (17): 6445-6449

    Abstract

    Colocalization of proteins that are part of the same signal transduction pathway via compartmentalization, scaffold, or anchor proteins is an essential aspect of the signal transduction system in eukaryotic cells. If interaction must occur via free diffusion, then the spatial separation between the sources of the two interacting proteins and their degradation rates become primary determinants of the time required for interaction. To understand the role of such colocalization, we create a mathematical model of the diffusion based protein-protein interaction process. We assume that mRNAs, which serve as the sources of these proteins, are located at different positions in the cytoplasm. For large cells such as Drosophila oocytes we show that if the source mRNAs were at random locations in the cell rather than colocalized, the average rate of interactions would be extremely small, which suggests that localization is needed to facilitate protein interactions and not just to prevent cross-talk between different signaling modules.

    View details for DOI 10.1073/pnas.0401314101

    View details for Web of Science ID 000221107900023

    View details for PubMedID 15096590

    View details for PubMedCentralID PMC404064

  • Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY Storey, J. D., Taylor, J. E., Siegmund, D. 2004; 66: 187-205
  • Gene expression patterns and gene copy number changes in dermatofibrosarcoma protuberans AMERICAN JOURNAL OF PATHOLOGY Linn, S. C., West, R. B., Pollack, J. R., Zhu, S., Hernandez-Boussard, T., Nielsen, T. O., Rubin, B. P., Patel, R., Goldblum, J. R., Siegmund, D., Botstein, D., Brown, P. O., Gilks, C. B., van de Rijn, M. 2003; 163 (6): 2383-2395

    Abstract

    Dermatofibrosarcoma protuberans (DFSP) is an aggressive spindle cell neoplasm. It is associated with the chromosomal translocation, t(17:22), which fuses the COL1A1 and PDGFbeta genes. We determined the characteristic gene expression profile of DFSP and characterized DNA copy number changes in DFSP by array-based comparative genomic hybridization (array CGH). Fresh frozen and formalin-fixed, paraffin-embedded samples of DFSP were analyzed by array CGH (four cases) and DNA microarray analysis of global gene expression (nine cases). The nine DFSPs were readily distinguished from 27 other diverse soft tissue tumors based on their gene expression patterns. Genes characteristically expressed in the DFSPs included PDGF beta and its receptor, PDGFRB, APOD, MEOX1, PLA2R, and PRKCA. Array CGH of DNA extracted either from frozen tumor samples or from paraffin blocks yielded equivalent results. Large areas of chromosomes 17q and 22q, bounded by COL1A1 and PDGF beta, respectively, were amplified in DFSP. Expression of genes in the amplified regions was significantly elevated. Our data shows that: 1) DFSP has a distinctive gene expression profile; 2) array CGH can be applied successfully to frozen or formalin-fixed, paraffin-embedded tumor samples; 3) a characteristic amplification of sequences from chromosomes 17q and 22q, demarcated by the COL1A1 and PDGF beta genes, respectively, was associated with elevated expression of the amplified genes.

    View details for PubMedID 14633610

  • Rotation space random fields with an application to fMRI data ANNALS OF STATISTICS Shafie, K., Sigal, B., Siegmund, D., Worsley, K. J. 2003; 31 (6): 1732-1771
  • Statistical analysis of direct identity-by-descent mapping ANNALS OF HUMAN GENETICS Siegmund, D., Yakir, B. 2003; 67: 464-470

    Abstract

    Genetic mismatch scanning has been suggested as a method for using affected pairs of ostensibly unrelated but putatively distantly related affecteds in isolated populations to map disease genes. We model the regions of identity-by-descent of these affected pairs as a continuous time two state process with unknown parameters that depend on the (unknown) relationships, and we estimate the unknown parameters from the observed data. Simulated data involving pairs of first to fourth cousins show that the procedure thus obtained has properties similar, albeit slightly inferior, to the case where the relationships of the affected pairs, hence the parameters governing the processes, are known.

    View details for Web of Science ID 000185326600007

    View details for PubMedID 12940919

  • Upward bias in estimation of genetic effects AMERICAN JOURNAL OF HUMAN GENETICS Siegmund, D. 2002; 71 (5): 1183-1188

    Abstract

    Because of the large number of tests for linkage that are performed in genome scans, the naive estimator of the size of a genetic effect in cases of borderline significance can be inflated and lead to unrealistic expectations for successful replication. As a remedy, this report proposes lower confidence limits that account for the multiple comparisons of the genome scan.

    View details for Web of Science ID 000178884300016

    View details for PubMedID 12386837

    View details for PubMedCentralID PMC385094

  • Frequentist estimation of coalescence times from nucleotide sequence data using a tree-based partition GENETICS Tang, H., Siegmund, D. O., Shen, P. D., Oefner, P. J., Feldman, M. W. 2002; 161 (1): 447-459

    Abstract

    This article proposes a method of estimating the time to the most recent common ancestor (TMRCA) of a sample of DNA sequences. The method is based on the molecular clock hypothesis, but avoids assumptions about population structure. Simulations show that in a wide range of situations, the point estimate has small bias and the confidence interval has at least the nominal coverage probability. We discuss conditions that can lead to biased estimates. Performance of this estimator is compared with existing methods based on the coalescence theory. The method is applied to sequences of Y chromosomes and mtDNAs to estimate the coalescent times of human male and female populations.

    View details for Web of Science ID 000175814900040

    View details for PubMedID 12019257

  • Mapping multiple genes for quantitative or complex traits GENETIC EPIDEMIOLOGY Tang, H. K., Siegmund, D. 2002; 22 (4): 313-327

    Abstract

    Models for complex and quantitative traits that involve multiple, possibly interacting, genes are described. Methods of linkage analysis are developed that utilize special features of these models, and their power is compared with that of simple genome scans that ignore these special features. Our calculations show that for family-based nonparametric linkage analysis in human genetics, in contrast to experimental genetics, there are limits to the increase in power that can be achieved by correctly modeling gene-gene interactions. In particular, the noncentrality parameter of likelihood-based statistics to detect single gene effects involves both single gene and interaction components of variance, so even when the interaction components of variance are relatively large, the incremental power from a statistic designed to detect both single gene and interaction effects is often quite modest. We carry out our analysis with the assistance of a parameterization that allows us to compute score statistics, noncentrality parameters, and Fisher information matrices reasonably explicitly.

    View details for DOI 10.1002/gepi.01108

    View details for Web of Science ID 000175413700004

    View details for PubMedID 11984864

  • Mapping quantitative trait loci in oligogenic models. Biostatistics Tang, H. K., Siegmund, D. 2001; 2 (2): 147-162

    Abstract

    We discuss strategies for mapping quantitative trait loci with emphasis on certain issues of study design that have recently received attention: e.g. genotyping only selected pedigrees and the comparative value of large pedigrees versus sib pairs. We use a standard variance components model and a parametrization of the genetic effects in which the 'segregation' parameters are locally orthogonal to the 'linkage' parameters. This permits simple explicit expressions for the expectation of the score statistic, which we use to compare the power of different strategies. We also discuss robustness of the score statistic.

    View details for PubMedID 12933546

  • Is peak height sufficient? GENETIC EPIDEMIOLOGY Siegmund, D. 2001; 20 (4): 403-408

    Abstract

    The suggestion that more power can be obtained from a genome scan by consideration of "peak width" in addition to "peak height" has been controversial. Regarding this question from the viewpoint of smoothing, one finds that to the extent that smoothing increases the informativeness of individual markers it is possible to obtain increased power; but for markers that are fully informative the value of smoothing is questionable.

    View details for Web of Science ID 000168316300001

    View details for PubMedID 11319781

  • Approximate p-values for local sequence alignments: Numerical studies JOURNAL OF COMPUTATIONAL BIOLOGY Storey, J. D., Siegmund, D. 2001; 8 (5): 549-556

    Abstract

    Siegmund and Yakir (2000) have given an approximate p-value when two independent, identically distributed sequences from a finite alphabet are optimally aligned based on a scoring system that rewards similarities according to a general scoring matrix and penalizes gaps (insertions and deletions). The approximation involves an infinite sequence of difficult-to-compute parameters. In this paper, it is shown by numerical studies that these reduce to essentially two numerically distinct parameters, which can be computed as one-dimensional numerical integrals. For an arbitrary scoring matrix and affine gap penalty, this modified approximation is easily evaluated. Comparison with published numerical results show that it is reasonably accurate.

    View details for Web of Science ID 000171950200006

    View details for PubMedID 11694182

  • Note on a stochastic recursion Symposium on State of the Art in Probability and Statistics: Festschrift for Willem R VanZwet Siegmund, D. INST MATHEMATICAL STATISTICS. 2001: 547–554
  • Approximate p-values for local sequence alignments ANNALS OF STATISTICS Siegmund, D., Yakir, B. 2000; 28 (3): 657-680
  • Tail probabilities for the null distribution of scanning statistics BERNOULLI Siegmund, D., Yakir, B. 2000; 6 (2): 191-213
  • The maximum of a function of a Markov chain and application to linkage analysis ADVANCES IN APPLIED PROBABILITY Tu, I. P., Siegmund, D. 1999; 31 (2): 510-531
  • Statistical methods for mapping quantitative trait loci from a dense set of markers GENETICS Dupuis, J., Siegmund, D. 1999; 151 (1): 373-386

    Abstract

    Lander and Botstein introduced statistical methods for searching an entire genome for quantitative trait loci (QTL) in experimental organisms, with emphasis on a backcross design and QTL having only additive effects. We extend their results to intercross and other designs, and we compare the power of the resulting test as a function of the magnitude of the additive and dominance effects, the sample size and intermarker distances. We also compare three methods for constructing confidence regions for a QTL: likelihood regions, Bayesian credible sets, and support regions. We show that with an appropriate evaluation of the coverage probability a support region is approximately a confidence region, and we provide a theroretical explanation of the empirical observation that the size of the support region is proportional to the sample size, not the square root of the sample size, as one might expect from standard statistical theory.

    View details for Web of Science ID 000078028900034

    View details for PubMedID 9872974

    View details for PubMedCentralID PMC1460471

  • Multipoint linkage analysis using affected relative pairs and partially informative markers BIOMETRICS Teng, J., Siegmund, D. 1998; 54 (4): 1247-1265

    Abstract

    Linkage analysis is a method of identifying regions of the human genome harboring genes affecting the risk for a particular disease. It works by finding chromosomal segments inherited by affected relatives from a common ancestor (i.e., identical by descent or IBD) in excess of that expected by chance. Two complicating factors are that only a relatively small number of genomic locations (marker loci) are examined and the number of distinct realizations (alleles) at each marker is not large. Hence, unambiguous determination of IBD is impossible for any genomic location without additional information. Assuming data from a set of mapped, partially informative markers, we evaluate the effectiveness of a method that analyzes the array of markers on each chromosome jointly (multipoint methods) as a function of the informativeness and density of the markers. For the special case of pairs of half siblings whose parents are also typed, a combination of analysis and simulation is used to obtain insight into the problem of setting thresholds to control the false-positive error rate. Approximations are given for the power, and guidelines are developed to help describe the trade-offs between marker density and informativeness.

    View details for Web of Science ID 000077898700004

    View details for PubMedID 9883537

  • Combining information within and between pedigrees for mapping complex traits AMERICAN JOURNAL OF HUMAN GENETICS Teng, J., Siegmund, D. 1997; 60 (4): 979-992

    Abstract

    This paper is concerned with efficient strategies for gene mapping using pedigrees containing small numbers of affecteds and identity-by-descent data from closely spaced markers throughout the genome. Particular attention is paid to additive traits involving phenocopies and/or locus heterogeneity. For a sample of pedigrees containing a particular configuration of affecteds, e.g., pairs of siblings together with a first cousin, we use a likelihood analysis to find 1-df statistics that are very efficient over a broad range of penetrances and allele frequencies. We identify configurations of affecteds that are particularly powerful for detecting linkage, and we show how pedigrees containing different numbers and configurations of affecteds can be efficiently combined in an overall test statistic.

    View details for Web of Science ID A1997WT61400028

    View details for PubMedID 9106545

  • Strategies for mapping heterogeneous recessive traits by allele-sharing methods AMERICAN JOURNAL OF HUMAN GENETICS Feingold, E., Siegmund, D. O. 1997; 60 (4): 965-978

    Abstract

    We investigate strategies for detecting linkage of recessive and partially recessive traits, using sibling pairs and inbred individuals. We assume that a genomewide search is being conducted and that locus heterogeneity of the trait is likely. For sibling pairs, we evaluate the efficiency of different statistics under the assumption that one does not know the true degree of recessiveness of the trait. We recommend a sibling-pair statistic that is a linear compromise between two previously suggested statistics. We also compare the power of sibling pairs to that of more distant relatives, such as cousins. For inbred individuals, we evaluate the power of offspring of different types of matings and compare them to sibling pairs. Over a broad range of trait etiologies, sibling pairs are more powerful than inbred individuals, but for traits caused by very rare alleles, particularly in the case of heterogeneity, inbred individuals can be much more powerful. The models we develop can also be used to examine specific situations other than those we look at. We present this analysis in the idealized context of a dense set of highly polymorphic markers. In general, incorporation of real-world complexities makes inbred individuals, particularly offspring of distant relatives, look slightly less useful than our results imply.

    View details for Web of Science ID A1997WT61400027

    View details for PubMedID 9106544

    View details for PubMedCentralID PMC1712456

  • The approximate distribution of the maximum of a smoothed Poisson random field STATISTICA SINICA Rabinowitz, D., Siegmund, D. 1997; 7 (1): 167-180
  • STATISTICAL-METHODS FOR LINKAGE ANALYSIS OF COMPLEX TRAITS FROM HIGH-RESOLUTION MAPS OF IDENTITY BY DESCENT GENETICS Dupuis, J., Brown, P. O., Siegmund, D. 1995; 140 (2): 843-856

    Abstract

    A multilocus model for complex traits is described that generalizes the additive and multiplicative models and hence allows simultaneously for both heterogeneity and gene interaction (epistasis). Statistical methods of linkage analysis are discussed under the assumption that identity by descent data from a dense set of polymorphic markers are available. Three methods, single locus search, simultaneous search and conditional search, are described and compared.

    View details for Web of Science ID A1995RA36600035

    View details for PubMedID 7498758

    View details for PubMedCentralID PMC1206656

  • TESTING FOR A SIGNAL WITH UNKNOWN LOCATION AND SCALE IN A STATIONARY GAUSSIAN RANDOM-FIELD ANNALS OF STATISTICS Siegmund, D. O., Worsley, K. J. 1995; 23 (2): 608-639
  • USING THE GENERALIZED LIKELIHOOD RATIO STATISTIC FOR SEQUENTIAL DETECTION OF A CHANGE-POINT ANNALS OF STATISTICS Siegmund, D., Venkatraman, E. S. 1995; 23 (1): 255-271
  • Confidence regions in broken line regression AMS/IMS/SIAM Summer Research Conference on Change-Point Problems Siegmund, D. O., Zhang, H. P. INST MATHEMATICAL STATISTICS. 1994: 292–316
  • THE EXPECTED NUMBER OF LOCAL MAXIMA OF A RANDOM-FIELD AND THE VOLUME OF TUBES ANNALS OF STATISTICS Siegmund, D., Zhang, H. P. 1993; 21 (4): 1948-1966
  • GAUSSIAN MODELS FOR GENETIC-LINKAGE ANALYSIS USING COMPLETE HIGH-RESOLUTION MAPS OF IDENTITY BY DESCENT AMERICAN JOURNAL OF HUMAN GENETICS Feingold, E., Brown, P. O., Siegmund, D. 1993; 53 (1): 234-251

    Abstract

    Gaussian-process models are developed to detect genetic linkage using complete high-resolution maps of identity by descent between affected relative pairs. Approximations are given for the significance level and power of the likelihood-ratio test of no linkage and for likelihood-ratio confidence regions for trait loci. The sample sizes required to detect linkage by using different classes of affected relative pairs are compared, and the problem of combining data from different classes of relatives is discussed.

    View details for Web of Science ID A1993LJ38500027

    View details for PubMedID 8317489

    View details for PubMedCentralID PMC1682227

  • A SEQUENTIAL CLINICAL-TRIAL FOR COMPARING 3 TREATMENTS ANNALS OF STATISTICS Siegmund, D. 1993; 21 (1): 464-483
  • ASYMPTOTIC APPROXIMATIONS FOR LIKELIHOOD RATIO TESTS AND CONFIDENCE-REGIONS FOR A CHANGE-POINT IN THE MEAN OF A MULTIVARIATE NORMAL-DISTRIBUTION STATISTICA SINICA James, B., James, K. L., Siegmund, D. 1992; 2 (1): 69-90
  • SEQUENTIAL DETECTION OF A CHANGE IN A NORMAL-MEAN WHEN THE INITIAL-VALUE IS UNKNOWN ANNALS OF STATISTICS Pollak, M., Siegmund, D. 1991; 19 (1): 394-416
  • CONFIDENCE-REGIONS IN SEMILINEAR REGRESSION BIOMETRIKA Knowles, M., Siegmund, D., Zhang, H. P. 1991; 78 (1): 15-31
  • ON HOTELLINGS APPROACH TO TESTING FOR A NONLINEAR PARAMETER IN REGRESSION INTERNATIONAL STATISTICAL REVIEW Knowles, M., Siegmund, D. 1989; 57 (3): 205-220
  • THE LIKELIHOOD RATIO TEST FOR A CHANGE-POINT IN SIMPLE LINEAR-REGRESSION BIOMETRIKA Kim, H. J., Siegmund, D. 1989; 76 (3): 409-423
  • APPROXIMATE EXIT PROBABILITIES FOR A BROWNIAN BRIDGE ON A SHORT-TIME INTERVAL, AND APPLICATIONS ADVANCES IN APPLIED PROBABILITY Lerche, H. R., Siegmund, D. 1989; 21 (1): 1-19
  • ON HOTELLING FORMULA FOR THE VOLUME OF TUBES AND NAIMAN INEQUALITY ANNALS OF STATISTICS Johnstone, I., Siegmund, D. 1989; 17 (1): 184-194
  • CONDITIONAL BOUNDARY CROSSING PROBABILITIES, WITH APPLICATIONS TO CHANGE-POINT PROBLEMS ANNALS OF PROBABILITY James, B., James, K. L., Siegmund, D. 1988; 16 (2): 825-839
  • APPROXIMATE TAIL PROBABILITIES FOR THE MAXIMA OF SOME RANDOM-FIELDS ANNALS OF PROBABILITY Siegmund, D. 1988; 16 (2): 487-501
  • CONFIDENCE SETS IN CHANGE-POINT PROBLEMS INTERNATIONAL STATISTICAL REVIEW Siegmund, D. 1988; 56 (1): 31-48
  • BOUNDARY CROSSING PROBABILITIES AND STATISTICAL APPLICATIONS ANNALS OF STATISTICS Siegmund, D. 1986; 14 (2): 361-404
  • CONVERGENCE OF QUASI-STATIONARY TO STATIONARY DISTRIBUTIONS FOR STOCHASTICALLY MONOTONE MARKOV-PROCESSES JOURNAL OF APPLIED PROBABILITY Pollak, M., Siegmund, D. 1986; 23 (1): 215-220
  • LARGE DEVIATIONS FOR THE MAXIMA OF SOME RANDOM-FIELDS ADVANCES IN APPLIED MATHEMATICS Hogan, M. L., Siegmund, D. 1986; 7 (1): 2-22
  • A DIFFUSION PROCESS AND ITS APPLICATIONS TO DETECTING A CHANGE IN THE DRIFT OF BROWNIAN-MOTION BIOMETRIKA Pollak, M., Siegmund, D. 1985; 72 (2): 267-280
  • FIXED ACCURACY ESTIMATION OF AN AUTOREGRESSIVE PARAMETER ANNALS OF STATISTICS Lai, T. L., Siegmund, D. 1983; 11 (2): 478-485
  • SEQUENTIAL-ANALYSIS OF THE PROPORTIONAL HAZARDS MODEL BIOMETRIKA Sellke, T., Siegmund, D. 1983; 70 (2): 315-326
  • MAXIMALLY SELECTED CHI-SQUARE STATISTICS BIOMETRICS Miller, R., Siegmund, D. 1982; 38 (4): 1011-1016
  • LARGE DEVIATIONS FOR BOUNDARY CROSSING PROBABILITIES ANNALS OF PROBABILITY Siegmund, D. 1982; 10 (3): 581-588
  • CONTINUOUS INTRAVENOUS VASOPRESSIN IN ACTIVE UPPER GASTROINTESTINAL-BLEEDING - A PLACEBO-CONTROLLED TRIAL ANNALS OF INTERNAL MEDICINE FOGEL, M. R., Knauer, C. M., ANDRES, L. L., MAHAL, A. S., STEIN, D. E., KEMENY, M. J., Rinki, M. M., Walker, J. E., Siegmund, D., Gregory, P. B. 1982; 96 (5): 565-569

    Abstract

    Sixty patients with active upper gastrointestinal bleeding were randomized to received either continuous intravenous infusions of vasopressin (29 patients) or placebo (31 patients) at a rate of 40 U/h. Six hours after beginning the study, 13 patients in the vasopressin group and 11 in the placebo group] had ceased bleeding (p = 0.46). By 24 hours. 17 patients in the vasopressin group and 14 in the placebo group had stopped bleeding (p = 0.30). Restriction of the analysis to patients bleeding from varices showed no advantage with vasopressin treatment after 6 or 24 hours. No consistent trend favoring use of vasopressin to stop hemorrhage was noted during the 30-month study period. There was little difference between the two groups in the number of patients needing surgery (13 on vasopressin, 18 on placebo; p = 0.30) or the number of deaths (eight on vasopressin, 11 on placebo; p = 0.51); the transfusion requirement was the same. In our patients, a continuous intravenous infusion of vasopressin neither controlled bleeding nor altered outcome.

    View details for Web of Science ID A1982NP94900004

    View details for PubMedID 7041728

  • BROWNIAN APPROXIMATIONS TO 1ST PASSAGE PROBABILITIES ZEITSCHRIFT FUR WAHRSCHEINLICHKEITSTHEORIE UND VERWANDTE GEBIETE Siegmund, D., Yuh, Y. S. 1982; 59 (2): 239-248
  • A SEQUENTIAL CLINICAL-TRIAL FOR TESTING P1=P2 ANNALS OF STATISTICS Siegmund, D., Gregory, P. 1980; 8 (6): 1219-1228
  • SEQUENTIAL X2 AND F-TESTS AND THE RELATED CONFIDENCE-INTERVALS BIOMETRIKA Siegmund, D. 1980; 67 (2): 389-402
  • NON-LINEAR RENEWAL THEORY WITH APPLICATIONS TO SEQUENTIAL-ANALYSIS .2. ANNALS OF STATISTICS Lai, T. L., Siegmund, D. 1979; 7 (1): 60-76
  • CORRECTED DIFFUSION APPROXIMATIONS IN CERTAIN RANDOM-WALK PROBLEMS ADVANCES IN APPLIED PROBABILITY Siegmund, D. 1979; 11 (4): 701-719
  • ESTIMATION FOLLOWING SEQUENTIAL TESTS BIOMETRIKA Siegmund, D. 1978; 65 (2): 341-349
  • NONLINEAR RENEWAL THEORY WITH APPLICATIONS TO SEQUENTIAL-ANALYSIS I ANNALS OF STATISTICS Lai, T. L., Siegmund, D. 1977; 5 (5): 946-954
  • REPEATED SIGNIFICANCE TESTS FOR A NORMAL MEAN BIOMETRIKA Siegmund, D. 1977; 64 (2): 177-189
  • EQUIVALENCE OF ABSORBING AND REFLECTING BARRIER PROBLEMS FOR STOCHASTICALLY MONOTONE MARKOV-PROCESSES ANNALS OF PROBABILITY Siegmund, D. 1976; 4 (6): 914-924
  • PROBABILITY DISTRIBUTIONS RELATED TO LAW OF ITERATED LOGARITHM PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Robbins, H., Siegmund, D. 1969; 62 (1): 11-?

    Abstract

    Let W(t) denote a standard Wiener process for 0 /= t((1/2))A(t) for some t >/= 1 (or for some t >/= 0) for a certain class of functions A(t), including functions which are approximately (2 log log t)((1/2)) as t --> infinity. We also give an invariance principle which states that this probability is the limit as m --> infinity of the probability that s(n) >/= n((1/2))A(n/m) for some n >/= m (or for some n >/= 1), where s(n) is the sum of n independent and identically distributed random variables with mean 0 and variance 1.

    View details for Web of Science ID A1969C892200002

    View details for PubMedID 16591726

    View details for PubMedCentralID PMC285947