PhD., Yale University, Genome-wide mapping and functional analysis of copy number variation in the human genome (2012)
MSc., Yale University, Molecular, Cellular and Developmental Biology (2008)
S.B., Massachusetts Institute of Technology, Mathematics (2005)
S.B., Massachusetts Institutes of Technology, Biology (2005)
Community and International Work
Rare Genomics Institute
Scientific Affairs, Strategic Alliances
Opportunities for Student Involvement
Current Research and Scholarly Interests
Using high resolution methods to map copy number variants (CNVs) in human genomes for establishing CNV associations to certain phenotypes.
Mapping genetic variants in the human genome using next generation sequencing technologies and integration of variants to deduce phenotypically relevant biological pathways.
- Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway GENETICS IN MEDICINE 2014; 16 (10): 751-758
Exome sequencing and genome-wide copy number variant mapping reveal novel associations with sensorineural hereditary hearing loss.
2014; 15: 1155-?
The genetic diversity of loci and mutations underlying hereditary hearing loss is an active area of investigation. To identify loci associated with predominantly non-syndromic sensorineural hearing loss, we performed exome sequencing of families and of single probands, as well as copy number variation (CNV) mapping in a case-control cohort.Analysis of three distinct families revealed several candidate loci in two families and a single strong candidate gene, MYH7B, for hearing loss in one family. MYH7B encodes a Type II myosin, consistent with a role for cytoskeletal proteins in hearing. High-resolution genome-wide CNV analysis of 150 cases and 157 controls revealed deletions in genes known to be involved in hearing (e.g. GJB6, OTOA, and STRC, encoding connexin 30, otoancorin, and stereocilin, respectively), supporting CNV contributions to hearing loss phenotypes. Additionally, a novel region on chromosome 16 containing part of the PDXDC1 gene was found to be frequently deleted in hearing loss patients (OR = 3.91, 95% CI: 1.62-9.40, p = 1.45 x 10-7).We conclude that many known as well as novel loci and distinct types of mutations not typically tested in clinical settings can contribute to the etiology of hearing loss. Our study also demonstrates the challenges of exome sequencing and genome-wide CNV mapping for direct clinical application, and illustrates the need for functional and clinical follow-up as well as curated open-access databases.
View details for DOI 10.1186/1471-2164-15-1155
View details for PubMedID 25528277
Impacts of Variation in the Human Genome on Gene Regulation
JOURNAL OF MOLECULAR BIOLOGY
2013; 425 (21): 3970-3977
Recent advances in fast and inexpensive DNA sequencing have enabled the extensive study of genomic and transciptomic variation in humans. Human genomic variation is composed of sequence and structural changes including single-nucleotide and multinucleotide variants, short insertions or deletions (indels), larger copy number variants, and similarly sized copy neutral inversions and translocations. It is now well established that any two genomes differ extensively and that structural changes constitute the most prominent source of this variation. There have also been major technological advances in RNA sequencing to globally quantify and describe diversity in transcripts. Large consortia such as the 1000 Genomes Project and the Enclyclopedia of DNA Elements Project are producing increasingly comphrehensive maps outlining the regions of the human genome containing variants and functional elements, respectively. Integration of genetic variation data and extensive annotation of functional genomic elements, along with the ability to measure global transcription, allow the impacts of genetic variants on gene expression to be resolved. There are several well-established models by which genetic variants affect gene regulation depending on the type, nature, and position of the variant with respect to the affected genes. These effects can be manifested in two ways: changes to transcript sequences and isoforms by coding variants, and changes to transcript abundance by dosage or regulatory variants. Here, we review the current state of how genetic variations impact gene regulation locally and globally in the human genome.
View details for DOI 10.1016/j.jmb.2013.07.015
View details for Web of Science ID 000326257100007
Computational and Bioinformatics Frameworks for Next-Generation Whole Exome and Genome Sequencing
SCIENTIFIC WORLD JOURNAL
It has become increasingly apparent that one of the major hurdles in the genomic age will be the bioinformatics challenges of next-generation sequencing. We provide an overview of a general framework of bioinformatics analysis. For each of the three stages of (1) alignment, (2) variant calling, and (3) filtering and annotation, we describe the analysis required and survey the different software packages that are used. Furthermore, we discuss possible future developments as data sources grow and highlight opportunities for new bioinformatics tools to be developed.
View details for DOI 10.1155/2013/730210
View details for Web of Science ID 000314128300001
View details for PubMedID 23365548
Child Development and Structural Variation in the Human Genome
2013; 84 (1): 34-48
Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects of structural variation on normal child development, but such effects could be of considerable significance. This review provides an overview of the phenomenon of structural variation in the human genome sequence, describing the novel genomics technologies that are revolutionizing the way structural variation is studied and giving examples of genomic structural variations that affect child development.
View details for DOI 10.1111/cdev.12051
View details for Web of Science ID 000314112000003
View details for PubMedID 23311762
- Personalizing rare disease research: how genomics is revolutionizing the diagnosis and treatment of rare disease PERSONALIZED MEDICINE 2012; 9 (8): 805-819
Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes
2012; 148 (6): 1293-1307
Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.
View details for DOI 10.1016/j.cell.2012.02.009
View details for Web of Science ID 000301889500023
View details for PubMedID 22424236
- Detecting and annotating genetic variations using the HugeSeq pipeline NATURE BIOTECHNOLOGY 2012; 30 (3): 226-229
Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms
2011; 6 (11)
Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications.
View details for DOI 10.1371/journal.pone.0027859
View details for Web of Science ID 000298168100021
View details for PubMedID 22140474
A map of human genome variation from population-scale sequencing
2010; 467 (7319): 1061-1073
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
View details for DOI 10.1038/nature09534
View details for Web of Science ID 000283548600039
View details for PubMedID 20981092