David Siegmund's Profile | Stanford Profiles

Academic Appointments

Emeritus Faculty, Acad Council, Statistics
Member, Bio-X

Contact

Academic
siegmund@stanford.edu
University - Emeritus faculty Department: Statistics Position: Emeritus Faculty, Acad Council
- Sequoia 140
- Stanford, California 94305-4065
(650) 725-8977 (fax)

Additional Info

Mail Code: 4065

2024-25 Courses

Independent Studies (2)
- Industrial Research for Statisticians
  STATS 398 (Aut, Win, Spr, Sum)
- Research
  STATS 399 (Aut, Win, Spr, Sum)
Prior Year Courses
2023-24 Courses
- Sequential Analysis
  STATS 223, STATS 323 (Aut)
2022-23 Courses
- Consulting Workshop
  STATS 390 (Win)
- Literature of Statistics
  STATS 319 (Win)
- Theory of Probability III
  MATH 230C, STATS 310C (Spr)
2021-22 Courses
- Consulting Workshop
  STATS 390 (Win)
- Literature of Statistics
  STATS 319 (Aut)
- Sequential Analysis
  STATS 223, STATS 323 (Win)

Stanford Advisees

Doctoral Dissertation Reader (AC)
Will Hartog

All Publications

SEGMENTATION AND ESTIMATION OF CHANGE-POINT MODELS: FALSE POSITIVE CONTROL AND CONFIDENCE REGIONS ANNALS OF STATISTICS Fang, X., Li, J., Siegmund, D. 2020; 48 (3): 1615–47

View details for DOI 10.1214/19-AOS1861

View details for Web of Science ID 000551644000017
A genome-wide approach for detecting novel insertion-deletion variants of mid-range size. Nucleic acids research Xia, L. C., Sakshuwong, S., Hopmans, E. S., Bell, J. M., Grimes, S. M., Siegmund, D. O., Ji, H. P., Zhang, N. R. 2016; 44 (15)

Abstract

We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.

View details for DOI 10.1093/nar/gkw481

View details for PubMedID 27325742

View details for PubMedCentralID PMC5009736
POISSON APPROXIMATION FOR TWO SCAN STATISTICS WITH RATES OF CONVERGENCE ANNALS OF APPLIED PROBABILITY Fang, X., Siegmund, D. 2016; 26 (4): 2384-2418

View details for DOI 10.1214/15-AAP1150

View details for Web of Science ID 000383411200014
SCAN STATISTICS ON POISSON RANDOM FIELDS WITH APPLICATIONS IN GENOMICS ANNALS OF APPLIED STATISTICS Zhang, N. R., Yakir, B., Xia, L. C., Siegmund, D. 2016; 10 (2): 726-755

View details for DOI 10.1214/15-AOAS892

View details for Web of Science ID 000385029700008
HIGHER CRITICISM: p-VALUES AND CRITICISM ANNALS OF STATISTICS Li, J., Siegmund, D. 2015; 43 (3): 1323-1350

View details for DOI 10.1214/15-AOS1312

View details for Web of Science ID 000355768700013
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION ANNALS OF STATISTICS Xie, Y., Siegmund, D. 2013; 41 (2): 670-692

View details for DOI 10.1214/13-AOS1094

View details for Web of Science ID 000320488200010
Change-Points: From Sequential Detection to Biology and Back SEQUENTIAL ANALYSIS-DESIGN METHODS AND APPLICATIONS Siegmund, D. 2013; 32 (1): 2-14

View details for DOI 10.1080/07474946.2013.751834

View details for Web of Science ID 000323819400002
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION Information Theory and Applications Workshop Xie, Y., Siegmund, D. IEEE. 2013

View details for Web of Science ID 000321214400067
MODEL SELECTION FOR HIGH-DIMENSIONAL, MULTI-SEQUENCE CHANGE-POINT PROBLEMS STATISTICA SINICA Zhang, N. R., Siegmund, D. O. 2012; 22 (4): 1507-1538

View details for DOI 10.5705/ss.2010.257

View details for Web of Science ID 000311812800014
Spectrum Opportunity Detection with Weak and Correlated Signals 46th Asilomar Conference on Signals, Systems and Computers Xie, Y., Siegmund, D. IEEE. 2012: 128–132

View details for Web of Science ID 000320768400022
False discovery rate for scanning statistics BIOMETRIKA Siegmund, D. O., Zhang, N. R., Yakir, B. 2011; 98 (4): 979-985

View details for DOI 10.1093/biomet/asr057

View details for Web of Science ID 000297366000016
DETECTING SIMULTANEOUS VARIANT INTERVALS IN ALIGNED SEQUENCES ANNALS OF APPLIED STATISTICS Siegmund, D., Yakir, B., Zhang, N. R. 2011; 5 (2A): 645-668

View details for DOI 10.1214/10-AOAS400

View details for Web of Science ID 000295453300003
Joint Testing of Genotype and Ancestry Association in Admixed Families GENETIC EPIDEMIOLOGY Tang, H., Siegmund, D. O., Johnson, N. A., Romieu, I., London, S. J. 2010; 34 (8): 783-791

Abstract

Current genome-wide association studies (GWAS) often involve populations that have experienced recent genetic admixture. Genotype data generated from these studies can be used to test for association directly, as in a non-admixed population. As an alternative, these data can be used to infer chromosomal ancestry, and thus allow for admixture mapping. We quantify the contribution of allele-based and ancestry-based association testing under a family-design, and demonstrate that the two tests can provide non-redundant information. We propose a joint testing procedure, which efficiently integrates the two sources information. The efficiencies of the allele, ancestry and combined tests are compared in the context of a GWAS. We discuss the impact of population history and provide guidelines for future design and analysis of GWAS in admixed populations.

View details for DOI 10.1002/gepi.20520

View details for Web of Science ID 000284719100002

View details for PubMedID 21031451

View details for PubMedCentralID PMC3103820
Detecting simultaneous changepoints in multiple sequences BIOMETRIKA Zhang, N. R., Siegmund, D. O., Ji, H., Li, J. Z. 2010; 97 (3): 631-645

Abstract

We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.

View details for DOI 10.1093/biomet/asq025

View details for Web of Science ID 000280904000008

View details for PubMedCentralID PMC3372242
Detecting simultaneous changepoints in multiple sequences. Biometrika Zhang, N. R., Siegmund, D. O., Ji, H., Li, J. Z. 2010; 97 (3): 631-645

Abstract

We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.

View details for DOI 10.1093/biomet/asq025

View details for PubMedID 22822250

View details for PubMedCentralID PMC3372242
Mapping Quantitative Traits in Unselected Families: Algorithms and Examples GENETIC EPIDEMIOLOGY Dupuis, J., Shi, J., Manning, A. K., Benjamin, E. J., Meigs, J. B., Cupples, L. A., Siegmund, D. 2009; 33 (7): 617-627

Abstract

Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic, which in contrast to the likelihood ratio statistic can use nonparametric estimators of variability to achieve robustness of the false-positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity by descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study.

View details for DOI 10.1002/gepi.20413

View details for Web of Science ID 000271406100006

View details for PubMedID 19278016

View details for PubMedCentralID PMC2766029
Minimax optimality of the Shiryayev-Roberts change-point detection rule JOURNAL OF STATISTICAL PLANNING AND INFERENCE Siegmund, D. O., Yakir, B. 2008; 138 (9): 2815-2825

View details for DOI 10.1016/j.jspi.2008.03.016

View details for Web of Science ID 000256602600019
The distribution of maxima of approximately Gaussian random fields ANNALS OF STATISTICS Nardi, Y., Siegmund, D. O., Yakir, B. 2008; 36 (3): 1375-1403

View details for DOI 10.1214/07-AOS511

View details for Web of Science ID 000256504400014
Detecting the emergence of a signal in a noisy image STATISTICS AND ITS INTERFACE Siegmund, D., Yakir, B. 2008; 1 (1): 3-12

View details for Web of Science ID 000207654700002
A unified framework for linkage and association analysis of quantitative traits PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Dupuis, J., Siegmund, D. O., Yakir, B. 2007; 104 (51): 20210-20215

Abstract

We give a unified treatment of the statistical foundations of population based association mapping and of family based linkage mapping of quantitative traits in humans. A central ingredient in the unification involves the efficient score statistic. The discussion focuses on generalized linear models with an additional illustration of the Cox (proportional hazards) model for age of onset data. We give analytic expressions for noncentrality parameters and show how they give qualitative insight into the loss of power that occurs if the scientist's assumed genetic model differs from nature's "true" genetic model. Issues to be studied in detail in the future development of this approach are discussed.

View details for DOI 10.1073/pnas.0707138105

View details for Web of Science ID 000251885000013

View details for PubMedID 18077372

View details for PubMedCentralID PMC2154410
Importance sampling for estimating p values in linkage analysis JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Shi, J., Siegmund, D., Yakir, B. 2007; 102 (479): 929-937

View details for DOI 10.1198/016214507000000680

View details for Web of Science ID 000249752300021
A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data BIOMETRICS Zhang, N. R., Siegmund, D. O. 2007; 63 (1): 22-32

Abstract

In the analysis of data generated by change-point processes, one critical challenge is to determine the number of change-points. The classic Bayes information criterion (BIC) statistic does not work well here because of irregularities in the likelihood function. By asymptotic approximation of the Bayes factor, we derive a modified BIC for the model of Brownian motion with changing drift. The modified BIC is similar to the classic BIC in the sense that the first term consists of the log likelihood, but it differs in the terms that penalize for model dimension. As an example of application, this new statistic is used to analyze array-based comparative genomic hybridization (array-CGH) data. Array-CGH measures the number of chromosome copies at each genome location of a cell sample, and is useful for finding the regions of genome deletion and amplification in tumor cells. The modified BIC performs well compared to existing methods in accurately choosing the number of regions of changed copy number. Unlike existing methods, it does not rely on tuning parameters or intensive computing. Thus it is impartial and easier to understand and to use.

View details for DOI 10.1111/j.1541-0420.2006.00662.x

View details for Web of Science ID 000244647100003

View details for PubMedID 17447926
Statistical corrections of linkage data suggest predominantly cis regulations of gene expression. BMC proceedings Shi, J., Siegmund, D. O., Levinson, D. F. 2007; 1: S145-?

Abstract

Morley et al. (Nature 2004, 430:743-747) detected significant linkages to the expression levels of 142 genes (of 3554) at a reported threshold of genome-wide p = 0.001 (LOD asymptotically equal to 5.3), using 14 three-generation Centre d'Etude du Polymorphisme Humain pedigrees. Most of the linkages (77%) were trans, i.e., more than 5 Mb from the expressed gene. However, the analysis did not account for the expected anti-conservative effect of the skewed distribution of score- or regression-based statistics in large sibships, or for the possible variance distortion due to correlations among tests. Therefore, we re-analyzed their data, using a robust score statistic for the entire pedigrees and correcting the p-values for skewness. We found that a LOD of 5.3 had a skewness-corrected genome-wide p-value of 0.016 instead of 0.001 (a result that we confirmed using simulation), with around 50 expected false positives. We then further corrected for correlation among the (skew-corrected) p-values by using Efron's method for obtaining the empirical null distribution. Setting a threshold of FDR = 10% (Z = 6.4, LOD = 8.9), we detected linkage for the expression levels of 22 genes, 19 of which are cis. Limiting the analysis to cis regions, linkage was detected to the expression levels of 46 genes with 4.6 expected false positives (FDR = 10%).

View details for PubMedID 18466489
Approximating the variance of the conditional probability of the state of a hidden Markov model STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY Siegmund, D. O., Yakir, B. 2007; 6

Abstract

In a hidden Markov model, one "estimates" the state of the hidden Markov chain at t by computing via the forwards-backwards algorithm the conditional distribution of the state vector given the observed data. The covariance matrix of this conditional distribution measures the information lost by failure to observe directly the state of the hidden process. In the case where changes of state occur slowly relative to the speed at which information about the underlying state accumulates in the observed data, we compute approximately these covariances in terms of functionals of Brownian motion that arise in change-point analysis. Applications in gene mapping, where these covariances play a role in standardizing the score statistic and in evaluating the loss of noncentrality due to incomplete information, are discussed. Numerical examples illustrate the range of validity and limitations of our results.

View details for Web of Science ID 000252387000001

View details for PubMedID 17672820
QTL mapping under ascertainment ANNALS OF HUMAN GENETICS Peng, J., Siegmund, D. 2006; 70: 867-881

Abstract

Mapping quantitative trait loci (QTL) using ascertained sibships is discussed. It is shown that under the standard normality assumption of variance components analysis the efficient scores are unchanged by ascertainment, and two different schemes of ascertainment correction suggested in the literature are asymptotically equivalent. The use of conditional maximum likelihood estimators derived under the normality assumption to estimate nuisance parameters is shown to result in only a small loss of power compared to the case of known parameters, even when the distribution of phenotypes is non-normal and/or the ascertainment criterion is ill defined.

View details for DOI 10.1111/j.1469-1809.2006.00286.x

View details for Web of Science ID 000241191400019

View details for PubMedID 17044862
Spatial regulation and the rate of signal transduction activation PLOS COMPUTATIONAL BIOLOGY Batada, N. N., Shepp, L. A., Siegmund, D. O., Levitt, M. 2006; 2 (5): 343-349

Abstract

Of the many important signaling events that take place on the surface of a mammalian cell, activation of signal transduction pathways via interactions of cell surface receptors is one of the most important. Evidence suggests that cell surface proteins are not as freely diffusible as implied by the classic fluid mosaic model and that their confinement to membrane domains is regulated. It is unknown whether these dynamic localization mechanisms function to enhance signal transduction activation rate or to minimize cross talk among pathways that share common intermediates. To determine which of these two possibilities is more likely, we derive an explicit equation for the rate at which cell surface membrane proteins interact based on a Brownian motion model in the presence of endocytosis and exocytosis. We find that in the absence of any diffusion constraints, cell surface protein interaction rate is extremely high relative to cytoplasmic protein interaction rate even in a large mammalian cell with a receptor abundance of a mere two hundred molecules. Since a larger number of downstream signaling events needs to take place, each occurring at a much slower rate than the initial activation via association of cell surface proteins, we conclude that the role of co-localization is most likely that of cross-talk reduction rather than coupling efficiency enhancement.

View details for DOI 10.1371/journal.pcbi.0020044

View details for Web of Science ID 000239493900003

View details for PubMedID 16699596

View details for PubMedCentralID PMC1458967
Genome scans with gene-covariate interaction GENETIC EPIDEMIOLOGY Peng, J., Tang, H. K., Siegmund, D. 2005; 29 (3): 173-184

Abstract

Genetic models for gene-covariate interactions are described. Methods of linkage analysis that utilize special features of these models and the corresponding score statistics are derived. Their power is compared with that of simple genome scans that ignore these special features, and substantial gains in power are observed when the gene-covariate interaction is strong. Quantitative trait mapping in randomly ascertained sibships and affected sibpair mapping are discussed. For the latter case, a simpler statistic is proposed that has similar performance to the score statistic, but does not require the estimation of nuisance parameters. Since the nuisance parameters are not estimable solely from affected sib-pair data, this statistic would be much easier to apply in practice. Similarities with linkage analysis of models for longitudinal data and multivariate phenotypes are also briefly discussed. Approximations for the P-value and power are derived under the framework of local alternatives.

View details for DOI 10.1002/gepi.20100

View details for Web of Science ID 000233059200001

View details for PubMedID 16216012
An urn model of Diaconis ANNALS OF PROBABILITY Siegmund, D., Yakir, B. 2005; 33 (5): 2036-2042

View details for DOI 10.1214/009117905000000314

View details for Web of Science ID 000232345300012
On the power for linkage detection using a test based on scan statistics BIOSTATISTICS Hernandez, S., Siegmund, D. O., de Gunst, M. 2005; 6 (2): 259-269

Abstract

We analyze some aspects of scan statistics, which have been proposed to help for the detection of weak signals in genetic linkage analysis. We derive approximate expressions for the power of a test based on moving averages of the identity by descent allele sharing proportions for pairs of relatives at several contiguous markers. We confirm these approximate formulae by simulation. The results show that when there is a single trait-locus on a chromosome, the test based on the scan statistic is slightly less powerful than that based on the customary allele sharing statistic. On the other hand, if two genes having a moderate effect on a trait lie close to each other on the same chromosome, scan statistics improve power to detect linkage.

View details for Web of Science ID 000228428200007

View details for PubMedID 15772104
The admixture model in linkage analysis JOURNAL OF STATISTICAL PLANNING AND INFERENCE Peng, J., Siegmund, D. 2005; 130 (1-2): 317-324

View details for DOI 10.1016/j.jspi.2003.07.022

View details for Web of Science ID 000226645200019
Model selection in irregular problems: Applications to mapping quantitative trait loci BIOMETRIKA Siegmund, D. 2004; 91 (4): 785-800

View details for Web of Science ID 000225940000002
A report on the future of statistics STATISTICAL SCIENCE Lindsay, B. G., Kettenring, J., Siegmund, D. O. 2004; 19 (3): 387-407

View details for DOI 10.1214/088342304000000404

View details for Web of Science ID 000227884700001
Mapping quantitative traits with random and with ascertained sibships PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Peng, J., Siegmund, D. 2004; 101 (21): 7845-7850

Abstract

Use of a robust score statistic based on a variance components model to map quantitative trait loci in randomly sampled pedigrees is reviewed. Sibships ascertained through a single proband are discussed. Under a standard assumption of multivariate normality, two suggested methods of ascertainment correction are shown to be asymptotically equivalent when the number of sibships is large.

View details for DOI 10.1073/pnas.0401713101

View details for Web of Science ID 000221652000003

View details for PubMedID 15084737

View details for PubMedCentralID PMC419519
Stochastic model of protein-protein interaction: Why signaling proteins need to be colocalized PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Batada, N. N., SHEPP, L. A., Siegmund, D. O. 2004; 101 (17): 6445-6449

Abstract

Colocalization of proteins that are part of the same signal transduction pathway via compartmentalization, scaffold, or anchor proteins is an essential aspect of the signal transduction system in eukaryotic cells. If interaction must occur via free diffusion, then the spatial separation between the sources of the two interacting proteins and their degradation rates become primary determinants of the time required for interaction. To understand the role of such colocalization, we create a mathematical model of the diffusion based protein-protein interaction process. We assume that mRNAs, which serve as the sources of these proteins, are located at different positions in the cytoplasm. For large cells such as Drosophila oocytes we show that if the source mRNAs were at random locations in the cell rather than colocalized, the average rate of interactions would be extremely small, which suggests that localization is needed to facilitate protein interactions and not just to prevent cross-talk between different signaling modules.

View details for DOI 10.1073/pnas.0401314101

View details for Web of Science ID 000221107900023

View details for PubMedID 15096590

View details for PubMedCentralID PMC404064
Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY Storey, J. D., Taylor, J. E., Siegmund, D. 2004; 66: 187-205

View details for Web of Science ID 000187448400012
Gene expression patterns and gene copy number changes in dermatofibrosarcoma protuberans AMERICAN JOURNAL OF PATHOLOGY Linn, S. C., West, R. B., Pollack, J. R., Zhu, S., Hernandez-Boussard, T., Nielsen, T. O., Rubin, B. P., Patel, R., Goldblum, J. R., Siegmund, D., Botstein, D., Brown, P. O., Gilks, C. B., van de Rijn, M. 2003; 163 (6): 2383-2395

Abstract

Dermatofibrosarcoma protuberans (DFSP) is an aggressive spindle cell neoplasm. It is associated with the chromosomal translocation, t(17:22), which fuses the COL1A1 and PDGFbeta genes. We determined the characteristic gene expression profile of DFSP and characterized DNA copy number changes in DFSP by array-based comparative genomic hybridization (array CGH). Fresh frozen and formalin-fixed, paraffin-embedded samples of DFSP were analyzed by array CGH (four cases) and DNA microarray analysis of global gene expression (nine cases). The nine DFSPs were readily distinguished from 27 other diverse soft tissue tumors based on their gene expression patterns. Genes characteristically expressed in the DFSPs included PDGF beta and its receptor, PDGFRB, APOD, MEOX1, PLA2R, and PRKCA. Array CGH of DNA extracted either from frozen tumor samples or from paraffin blocks yielded equivalent results. Large areas of chromosomes 17q and 22q, bounded by COL1A1 and PDGF beta, respectively, were amplified in DFSP. Expression of genes in the amplified regions was significantly elevated. Our data shows that: 1) DFSP has a distinctive gene expression profile; 2) array CGH can be applied successfully to frozen or formalin-fixed, paraffin-embedded tumor samples; 3) a characteristic amplification of sequences from chromosomes 17q and 22q, demarcated by the COL1A1 and PDGF beta genes, respectively, was associated with elevated expression of the amplified genes.

View details for PubMedID 14633610
Rotation space random fields with an application to fMRI data ANNALS OF STATISTICS Shafie, K., Sigal, B., Siegmund, D., Worsley, K. J. 2003; 31 (6): 1732-1771

View details for Web of Science ID 000188780400002
Statistical analysis of direct identity-by-descent mapping ANNALS OF HUMAN GENETICS Siegmund, D., Yakir, B. 2003; 67: 464-470

Abstract

Genetic mismatch scanning has been suggested as a method for using affected pairs of ostensibly unrelated but putatively distantly related affecteds in isolated populations to map disease genes. We model the regions of identity-by-descent of these affected pairs as a continuous time two state process with unknown parameters that depend on the (unknown) relationships, and we estimate the unknown parameters from the observed data. Simulated data involving pairs of first to fourth cousins show that the procedure thus obtained has properties similar, albeit slightly inferior, to the case where the relationships of the affected pairs, hence the parameters governing the processes, are known.

View details for Web of Science ID 000185326600007

View details for PubMedID 12940919
Upward bias in estimation of genetic effects AMERICAN JOURNAL OF HUMAN GENETICS Siegmund, D. 2002; 71 (5): 1183-1188

Abstract

Because of the large number of tests for linkage that are performed in genome scans, the naive estimator of the size of a genetic effect in cases of borderline significance can be inflated and lead to unrealistic expectations for successful replication. As a remedy, this report proposes lower confidence limits that account for the multiple comparisons of the genome scan.

View details for Web of Science ID 000178884300016

View details for PubMedID 12386837

View details for PubMedCentralID PMC385094
Frequentist estimation of coalescence times from nucleotide sequence data using a tree-based partition GENETICS Tang, H., Siegmund, D. O., Shen, P. D., Oefner, P. J., Feldman, M. W. 2002; 161 (1): 447-459

Abstract

This article proposes a method of estimating the time to the most recent common ancestor (TMRCA) of a sample of DNA sequences. The method is based on the molecular clock hypothesis, but avoids assumptions about population structure. Simulations show that in a wide range of situations, the point estimate has small bias and the confidence interval has at least the nominal coverage probability. We discuss conditions that can lead to biased estimates. Performance of this estimator is compared with existing methods based on the coalescence theory. The method is applied to sequences of Y chromosomes and mtDNAs to estimate the coalescent times of human male and female populations.

View details for Web of Science ID 000175814900040

View details for PubMedID 12019257
Mapping multiple genes for quantitative or complex traits GENETIC EPIDEMIOLOGY Tang, H. K., Siegmund, D. 2002; 22 (4): 313-327

Abstract

Models for complex and quantitative traits that involve multiple, possibly interacting, genes are described. Methods of linkage analysis are developed that utilize special features of these models, and their power is compared with that of simple genome scans that ignore these special features. Our calculations show that for family-based nonparametric linkage analysis in human genetics, in contrast to experimental genetics, there are limits to the increase in power that can be achieved by correctly modeling gene-gene interactions. In particular, the noncentrality parameter of likelihood-based statistics to detect single gene effects involves both single gene and interaction components of variance, so even when the interaction components of variance are relatively large, the incremental power from a statistic designed to detect both single gene and interaction effects is often quite modest. We carry out our analysis with the assistance of a parameterization that allows us to compute score statistics, noncentrality parameters, and Fisher information matrices reasonably explicitly.

View details for DOI 10.1002/gepi.01108

View details for Web of Science ID 000175413700004

View details for PubMedID 11984864
Mapping quantitative trait loci in oligogenic models. Biostatistics Tang, H. K., Siegmund, D. 2001; 2 (2): 147-162

Abstract

We discuss strategies for mapping quantitative trait loci with emphasis on certain issues of study design that have recently received attention: e.g. genotyping only selected pedigrees and the comparative value of large pedigrees versus sib pairs. We use a standard variance components model and a parametrization of the genetic effects in which the 'segregation' parameters are locally orthogonal to the 'linkage' parameters. This permits simple explicit expressions for the expectation of the score statistic, which we use to compare the power of different strategies. We also discuss robustness of the score statistic.

View details for PubMedID 12933546
Is peak height sufficient? GENETIC EPIDEMIOLOGY Siegmund, D. 2001; 20 (4): 403-408

Abstract

The suggestion that more power can be obtained from a genome scan by consideration of "peak width" in addition to "peak height" has been controversial. Regarding this question from the viewpoint of smoothing, one finds that to the extent that smoothing increases the informativeness of individual markers it is possible to obtain increased power; but for markers that are fully informative the value of smoothing is questionable.

View details for Web of Science ID 000168316300001

View details for PubMedID 11319781
Approximate p-values for local sequence alignments: Numerical studies JOURNAL OF COMPUTATIONAL BIOLOGY Storey, J. D., Siegmund, D. 2001; 8 (5): 549-556

Abstract

Siegmund and Yakir (2000) have given an approximate p-value when two independent, identically distributed sequences from a finite alphabet are optimally aligned based on a scoring system that rewards similarities according to a general scoring matrix and penalizes gaps (insertions and deletions). The approximation involves an infinite sequence of difficult-to-compute parameters. In this paper, it is shown by numerical studies that these reduce to essentially two numerically distinct parameters, which can be computed as one-dimensional numerical integrals. For an arbitrary scoring matrix and affine gap penalty, this modified approximation is easily evaluated. Comparison with published numerical results show that it is reasonably accurate.

View details for Web of Science ID 000171950200006

View details for PubMedID 11694182
Note on a stochastic recursion Symposium on State of the Art in Probability and Statistics: Festschrift for Willem R VanZwet Siegmund, D. INST MATHEMATICAL STATISTICS. 2001: 547–554

View details for Web of Science ID 000175458600029
Approximate p-values for local sequence alignments ANNALS OF STATISTICS Siegmund, D., Yakir, B. 2000; 28 (3): 657-680

View details for Web of Science ID 000165456000001
Tail probabilities for the null distribution of scanning statistics BERNOULLI Siegmund, D., Yakir, B. 2000; 6 (2): 191-213

View details for Web of Science ID 000086645400001
The maximum of a function of a Markov chain and application to linkage analysis ADVANCES IN APPLIED PROBABILITY Tu, I. P., Siegmund, D. 1999; 31 (2): 510-531

View details for Web of Science ID 000083438800011
Statistical methods for mapping quantitative trait loci from a dense set of markers GENETICS Dupuis, J., Siegmund, D. 1999; 151 (1): 373-386

Abstract

Lander and Botstein introduced statistical methods for searching an entire genome for quantitative trait loci (QTL) in experimental organisms, with emphasis on a backcross design and QTL having only additive effects. We extend their results to intercross and other designs, and we compare the power of the resulting test as a function of the magnitude of the additive and dominance effects, the sample size and intermarker distances. We also compare three methods for constructing confidence regions for a QTL: likelihood regions, Bayesian credible sets, and support regions. We show that with an appropriate evaluation of the coverage probability a support region is approximately a confidence region, and we provide a theroretical explanation of the empirical observation that the size of the support region is proportional to the sample size, not the square root of the sample size, as one might expect from standard statistical theory.

View details for Web of Science ID 000078028900034

View details for PubMedID 9872974

View details for PubMedCentralID PMC1460471
Multipoint linkage analysis using affected relative pairs and partially informative markers BIOMETRICS Teng, J., Siegmund, D. 1998; 54 (4): 1247-1265

Abstract

Linkage analysis is a method of identifying regions of the human genome harboring genes affecting the risk for a particular disease. It works by finding chromosomal segments inherited by affected relatives from a common ancestor (i.e., identical by descent or IBD) in excess of that expected by chance. Two complicating factors are that only a relatively small number of genomic locations (marker loci) are examined and the number of distinct realizations (alleles) at each marker is not large. Hence, unambiguous determination of IBD is impossible for any genomic location without additional information. Assuming data from a set of mapped, partially informative markers, we evaluate the effectiveness of a method that analyzes the array of markers on each chromosome jointly (multipoint methods) as a function of the informativeness and density of the markers. For the special case of pairs of half siblings whose parents are also typed, a combination of analysis and simulation is used to obtain insight into the problem of setting thresholds to control the false-positive error rate. Approximations are given for the power, and guidelines are developed to help describe the trade-offs between marker density and informativeness.

View details for Web of Science ID 000077898700004

View details for PubMedID 9883537
Combining information within and between pedigrees for mapping complex traits AMERICAN JOURNAL OF HUMAN GENETICS Teng, J., Siegmund, D. 1997; 60 (4): 979-992

Abstract

This paper is concerned with efficient strategies for gene mapping using pedigrees containing small numbers of affecteds and identity-by-descent data from closely spaced markers throughout the genome. Particular attention is paid to additive traits involving phenocopies and/or locus heterogeneity. For a sample of pedigrees containing a particular configuration of affecteds, e.g., pairs of siblings together with a first cousin, we use a likelihood analysis to find 1-df statistics that are very efficient over a broad range of penetrances and allele frequencies. We identify configurations of affecteds that are particularly powerful for detecting linkage, and we show how pedigrees containing different numbers and configurations of affecteds can be efficiently combined in an overall test statistic.

View details for Web of Science ID A1997WT61400028

View details for PubMedID 9106545
Strategies for mapping heterogeneous recessive traits by allele-sharing methods AMERICAN JOURNAL OF HUMAN GENETICS Feingold, E., Siegmund, D. O. 1997; 60 (4): 965-978

Abstract

We investigate strategies for detecting linkage of recessive and partially recessive traits, using sibling pairs and inbred individuals. We assume that a genomewide search is being conducted and that locus heterogeneity of the trait is likely. For sibling pairs, we evaluate the efficiency of different statistics under the assumption that one does not know the true degree of recessiveness of the trait. We recommend a sibling-pair statistic that is a linear compromise between two previously suggested statistics. We also compare the power of sibling pairs to that of more distant relatives, such as cousins. For inbred individuals, we evaluate the power of offspring of different types of matings and compare them to sibling pairs. Over a broad range of trait etiologies, sibling pairs are more powerful than inbred individuals, but for traits caused by very rare alleles, particularly in the case of heterogeneity, inbred individuals can be much more powerful. The models we develop can also be used to examine specific situations other than those we look at. We present this analysis in the idealized context of a dense set of highly polymorphic markers. In general, incorporation of real-world complexities makes inbred individuals, particularly offspring of distant relatives, look slightly less useful than our results imply.

View details for Web of Science ID A1997WT61400027

View details for PubMedID 9106544

View details for PubMedCentralID PMC1712456
The approximate distribution of the maximum of a smoothed Poisson random field STATISTICA SINICA Rabinowitz, D., Siegmund, D. 1997; 7 (1): 167-180

View details for Web of Science ID A1997WF57100011
STATISTICAL-METHODS FOR LINKAGE ANALYSIS OF COMPLEX TRAITS FROM HIGH-RESOLUTION MAPS OF IDENTITY BY DESCENT GENETICS Dupuis, J., Brown, P. O., Siegmund, D. 1995; 140 (2): 843-856

Abstract

A multilocus model for complex traits is described that generalizes the additive and multiplicative models and hence allows simultaneously for both heterogeneity and gene interaction (epistasis). Statistical methods of linkage analysis are discussed under the assumption that identity by descent data from a dense set of polymorphic markers are available. Three methods, single locus search, simultaneous search and conditional search, are described and compared.

View details for Web of Science ID A1995RA36600035

View details for PubMedID 7498758

View details for PubMedCentralID PMC1206656
TESTING FOR A SIGNAL WITH UNKNOWN LOCATION AND SCALE IN A STATIONARY GAUSSIAN RANDOM-FIELD ANNALS OF STATISTICS Siegmund, D. O., Worsley, K. J. 1995; 23 (2): 608-639

View details for Web of Science ID A1995RH23800016
USING THE GENERALIZED LIKELIHOOD RATIO STATISTIC FOR SEQUENTIAL DETECTION OF A CHANGE-POINT ANNALS OF STATISTICS Siegmund, D., Venkatraman, E. S. 1995; 23 (1): 255-271

View details for Web of Science ID A1995RE61100016
Confidence regions in broken line regression AMS/IMS/SIAM Summer Research Conference on Change-Point Problems Siegmund, D. O., Zhang, H. P. INST MATHEMATICAL STATISTICS. 1994: 292–316

View details for Web of Science ID A1994BF61K00023
THE EXPECTED NUMBER OF LOCAL MAXIMA OF A RANDOM-FIELD AND THE VOLUME OF TUBES ANNALS OF STATISTICS Siegmund, D., Zhang, H. P. 1993; 21 (4): 1948-1966

View details for Web of Science ID A1993MQ47700013
GAUSSIAN MODELS FOR GENETIC-LINKAGE ANALYSIS USING COMPLETE HIGH-RESOLUTION MAPS OF IDENTITY BY DESCENT AMERICAN JOURNAL OF HUMAN GENETICS Feingold, E., Brown, P. O., Siegmund, D. 1993; 53 (1): 234-251

Abstract

Gaussian-process models are developed to detect genetic linkage using complete high-resolution maps of identity by descent between affected relative pairs. Approximations are given for the significance level and power of the likelihood-ratio test of no linkage and for likelihood-ratio confidence regions for trait loci. The sample sizes required to detect linkage by using different classes of affected relative pairs are compared, and the problem of combining data from different classes of relatives is discussed.

View details for Web of Science ID A1993LJ38500027

View details for PubMedID 8317489

View details for PubMedCentralID PMC1682227
A SEQUENTIAL CLINICAL-TRIAL FOR COMPARING 3 TREATMENTS ANNALS OF STATISTICS Siegmund, D. 1993; 21 (1): 464-483

View details for Web of Science ID A1993LB38900026
ASYMPTOTIC APPROXIMATIONS FOR LIKELIHOOD RATIO TESTS AND CONFIDENCE-REGIONS FOR A CHANGE-POINT IN THE MEAN OF A MULTIVARIATE NORMAL-DISTRIBUTION STATISTICA SINICA James, B., James, K. L., Siegmund, D. 1992; 2 (1): 69-90

View details for Web of Science ID A1992HC09200004
SEQUENTIAL DETECTION OF A CHANGE IN A NORMAL-MEAN WHEN THE INITIAL-VALUE IS UNKNOWN ANNALS OF STATISTICS Pollak, M., Siegmund, D. 1991; 19 (1): 394-416

View details for Web of Science ID A1991FF04700028
CONFIDENCE-REGIONS IN SEMILINEAR REGRESSION BIOMETRIKA Knowles, M., Siegmund, D., Zhang, H. P. 1991; 78 (1): 15-31

View details for Web of Science ID A1991FD52300003
ON HOTELLINGS APPROACH TO TESTING FOR A NONLINEAR PARAMETER IN REGRESSION INTERNATIONAL STATISTICAL REVIEW Knowles, M., Siegmund, D. 1989; 57 (3): 205-220

View details for Web of Science ID A1989CE41900002
THE LIKELIHOOD RATIO TEST FOR A CHANGE-POINT IN SIMPLE LINEAR-REGRESSION BIOMETRIKA Kim, H. J., Siegmund, D. 1989; 76 (3): 409-423

View details for Web of Science ID A1989AR09100001
APPROXIMATE EXIT PROBABILITIES FOR A BROWNIAN BRIDGE ON A SHORT-TIME INTERVAL, AND APPLICATIONS ADVANCES IN APPLIED PROBABILITY Lerche, H. R., Siegmund, D. 1989; 21 (1): 1-19

View details for Web of Science ID A1989T648700001
ON HOTELLING FORMULA FOR THE VOLUME OF TUBES AND NAIMAN INEQUALITY ANNALS OF STATISTICS Johnstone, I., Siegmund, D. 1989; 17 (1): 184-194

View details for Web of Science ID A1989T614100009
CONDITIONAL BOUNDARY CROSSING PROBABILITIES, WITH APPLICATIONS TO CHANGE-POINT PROBLEMS ANNALS OF PROBABILITY James, B., James, K. L., Siegmund, D. 1988; 16 (2): 825-839

View details for Web of Science ID A1988M945700023
CONFIDENCE SETS IN CHANGE-POINT PROBLEMS INTERNATIONAL STATISTICAL REVIEW Siegmund, D. 1988; 56 (1): 31-48

View details for Web of Science ID A1988M962700003
APPROXIMATE TAIL PROBABILITIES FOR THE MAXIMA OF SOME RANDOM-FIELDS ANNALS OF PROBABILITY Siegmund, D. 1988; 16 (2): 487-501

View details for Web of Science ID A1988M945700003
BOUNDARY CROSSING PROBABILITIES AND STATISTICAL APPLICATIONS ANNALS OF STATISTICS Siegmund, D. 1986; 14 (2): 361-404

View details for Web of Science ID A1986D171100001
LARGE DEVIATIONS FOR THE MAXIMA OF SOME RANDOM-FIELDS ADVANCES IN APPLIED MATHEMATICS Hogan, M. L., Siegmund, D. 1986; 7 (1): 2-22

View details for Web of Science ID A1986A881100002
CONVERGENCE OF QUASI-STATIONARY TO STATIONARY DISTRIBUTIONS FOR STOCHASTICALLY MONOTONE MARKOV-PROCESSES JOURNAL OF APPLIED PROBABILITY Pollak, M., Siegmund, D. 1986; 23 (1): 215-220

View details for Web of Science ID A1986C400600020
A DIFFUSION PROCESS AND ITS APPLICATIONS TO DETECTING A CHANGE IN THE DRIFT OF BROWNIAN-MOTION BIOMETRIKA Pollak, M., Siegmund, D. 1985; 72 (2): 267-280

View details for Web of Science ID A1985AMZ4700004
SEQUENTIAL-ANALYSIS OF THE PROPORTIONAL HAZARDS MODEL BIOMETRIKA Sellke, T., Siegmund, D. 1983; 70 (2): 315-326

View details for Web of Science ID A1983RA65800002
FIXED ACCURACY ESTIMATION OF AN AUTOREGRESSIVE PARAMETER ANNALS OF STATISTICS Lai, T. L., Siegmund, D. 1983; 11 (2): 478-485

View details for Web of Science ID A1983QT81500010
MAXIMALLY SELECTED CHI-SQUARE STATISTICS BIOMETRICS Miller, R., Siegmund, D. 1982; 38 (4): 1011-1016

View details for Web of Science ID A1982QF89700013
CONTINUOUS INTRAVENOUS VASOPRESSIN IN ACTIVE UPPER GASTROINTESTINAL-BLEEDING - A PLACEBO-CONTROLLED TRIAL ANNALS OF INTERNAL MEDICINE FOGEL, M. R., Knauer, C. M., ANDRES, L. L., MAHAL, A. S., STEIN, D. E., KEMENY, M. J., Rinki, M. M., Walker, J. E., Siegmund, D., Gregory, P. B. 1982; 96 (5): 565-569

Abstract

Sixty patients with active upper gastrointestinal bleeding were randomized to received either continuous intravenous infusions of vasopressin (29 patients) or placebo (31 patients) at a rate of 40 U/h. Six hours after beginning the study, 13 patients in the vasopressin group and 11 in the placebo group] had ceased bleeding (p = 0.46). By 24 hours. 17 patients in the vasopressin group and 14 in the placebo group had stopped bleeding (p = 0.30). Restriction of the analysis to patients bleeding from varices showed no advantage with vasopressin treatment after 6 or 24 hours. No consistent trend favoring use of vasopressin to stop hemorrhage was noted during the 30-month study period. There was little difference between the two groups in the number of patients needing surgery (13 on vasopressin, 18 on placebo; p = 0.30) or the number of deaths (eight on vasopressin, 11 on placebo; p = 0.51); the transfusion requirement was the same. In our patients, a continuous intravenous infusion of vasopressin neither controlled bleeding nor altered outcome.

View details for Web of Science ID A1982NP94900004

View details for PubMedID 7041728
BROWNIAN APPROXIMATIONS TO 1ST PASSAGE PROBABILITIES ZEITSCHRIFT FUR WAHRSCHEINLICHKEITSTHEORIE UND VERWANDTE GEBIETE Siegmund, D., Yuh, Y. S. 1982; 59 (2): 239-248

View details for Web of Science ID A1982NJ23400009
LARGE DEVIATIONS FOR BOUNDARY CROSSING PROBABILITIES ANNALS OF PROBABILITY Siegmund, D. 1982; 10 (3): 581-588

View details for Web of Science ID A1982PC77000005
SEQUENTIAL X2 AND F-TESTS AND THE RELATED CONFIDENCE-INTERVALS BIOMETRIKA Siegmund, D. 1980; 67 (2): 389-402

View details for Web of Science ID A1980KA99800013
A SEQUENTIAL CLINICAL-TRIAL FOR TESTING P1=P2 ANNALS OF STATISTICS Siegmund, D., Gregory, P. 1980; 8 (6): 1219-1228

View details for Web of Science ID A1980KT80600003
CORRECTED DIFFUSION APPROXIMATIONS IN CERTAIN RANDOM-WALK PROBLEMS ADVANCES IN APPLIED PROBABILITY Siegmund, D. 1979; 11 (4): 701-719

View details for Web of Science ID A1979HV96600002
NON-LINEAR RENEWAL THEORY WITH APPLICATIONS TO SEQUENTIAL-ANALYSIS .2. ANNALS OF STATISTICS Lai, T. L., Siegmund, D. 1979; 7 (1): 60-76

View details for Web of Science ID A1979GL00300004
ESTIMATION FOLLOWING SEQUENTIAL TESTS BIOMETRIKA Siegmund, D. 1978; 65 (2): 341-349

View details for Web of Science ID A1978FL40100012
NONLINEAR RENEWAL THEORY WITH APPLICATIONS TO SEQUENTIAL-ANALYSIS I ANNALS OF STATISTICS Lai, T. L., Siegmund, D. 1977; 5 (5): 946-954

View details for Web of Science ID A1977DV19800010
REPEATED SIGNIFICANCE TESTS FOR A NORMAL MEAN BIOMETRIKA Siegmund, D. 1977; 64 (2): 177-189

View details for Web of Science ID A1977DQ65700001
EQUIVALENCE OF ABSORBING AND REFLECTING BARRIER PROBLEMS FOR STOCHASTICALLY MONOTONE MARKOV-PROCESSES ANNALS OF PROBABILITY Siegmund, D. 1976; 4 (6): 914-924

View details for Web of Science ID A1976CN99500004
PROBABILITY DISTRIBUTIONS RELATED TO LAW OF ITERATED LOGARITHM PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Robbins, H., Siegmund, D. 1969; 62 (1): 11-?

Abstract

Let W(t) denote a standard Wiener process for 0 /= t((1/2))A(t) for some t >/= 1 (or for some t >/= 0) for a certain class of functions A(t), including functions which are approximately (2 log log t)((1/2)) as t --> infinity. We also give an invariance principle which states that this probability is the limit as m --> infinity of the probability that s(n) >/= n((1/2))A(n/m) for some n >/= m (or for some n >/= 1), where s(n) is the sum of n independent and identically distributed random variables with mean 0 and variance 1.

View details for Web of Science ID A1969C892200002

View details for PubMedID 16591726

View details for PubMedCentralID PMC285947

David Siegmund

John D. and Sigrid Banks Professor, Emeritus

Statistics

Academic Appointments

Contact

Additional Info

2024-25 Courses

2023-24 Courses

2022-23 Courses

2021-22 Courses

Stanford Advisees

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract