Honors & Awards
New Scholar Award, Ellison Medical Foundation (2012-2016)
Postdoctoral Fellow, Harvard Medical School, Genomics and Technology
Ph.D., Washington University in St. Louis, Genetics and Computational Biology (2005)
M.S., Tsinghua University, Molecular Biology (1999)
B.S., Tsinghua University, Biology (1997)
Current Research and Scholarly Interests
RNA editing: identification, regulation, and function
- Next Generation Sequencing and Applications
BIOS 201 (Win)
Independent Studies (11)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum)
- Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum)
- Directed Reading in Biophysics
BIOPHYS 399 (Win, Spr)
- Directed Reading in Genetics
GENE 299 (Aut, Win, Spr, Sum)
- Graduate Research
BIOPHYS 300 (Win, Spr)
- Graduate Research
GENE 399 (Aut, Win, Spr, Sum)
- Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum)
- Medical Scholars Research
GENE 370 (Aut, Win, Spr, Sum)
- Out-of-Department Advanced Research Laboratory in Experimental Biology
BIO 199X (Spr)
- Supervised Study
GENE 260 (Aut, Win, Spr, Sum)
- Undergraduate Research
GENE 199 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
Prior Year Courses
- Genetics and Developmental Biology Training Camp
DBIO 200, GENE 200 (Aut)
- Next Generation Sequencing and Applications
BIOS 201 (Win)
- Genetics and Developmental Biology Training Camp
DBIO 200, GENE 200 (Aut)
- Next Generation Sequencing and Applications
BIOS 201 (Win)
- Genetics and Developmental Biology Training Camp
Evolutionary analysis reveals regulatory and functional landscape of coding and non-coding RNA editing.
2017; 13 (2)
Adenosine-to-inosine RNA editing diversifies the transcriptome and promotes functional diversity, particularly in the brain. A plethora of editing sites has been recently identified; however, how they are selected and regulated and which are functionally important are largely unknown. Here we show the cis-regulation and stepwise selection of RNA editing during Drosophila evolution and pinpoint a large number of functional editing sites. We found that the establishment of editing and variation in editing levels across Drosophila species are largely explained and predicted by cis-regulatory elements. Furthermore, editing events that arose early in the species tree tend to be more highly edited in clusters and enriched in slowly-evolved neuronal genes, thus suggesting that the main role of RNA editing is for fine-tuning neurological functions. While nonsynonymous editing events have been long recognized as playing a functional role, in addition to nonsynonymous editing sites, a large fraction of 3'UTR editing sites is evolutionarily constrained, highly edited, and thus likely functional. We find that these 3'UTR editing events can alter mRNA stability and affect miRNA binding and thus highlight the functional roles of noncoding RNA editing. Our work, through evolutionary analyses of RNA editing in Drosophila, uncovers novel insights of RNA editing regulation as well as its functions in both coding and non-coding regions.
View details for DOI 10.1371/journal.pgen.1006563
View details for PubMedID 28166241
View details for PubMedCentralID PMC5319793
Dynamic landscape and regulation of RNA editing in mammals.
2017; 550 (7675): 249–54
Adenosine-to-inosine (A-to-I) RNA editing is a conserved post-transcriptional mechanism mediated by ADAR enzymes that diversifies the transcriptome by altering selected nucleotides in RNA molecules. Although many editing sites have recently been discovered, the extent to which most sites are edited and how the editing is regulated in different biological contexts are not fully understood. Here we report dynamic spatiotemporal patterns and new regulators of RNA editing, discovered through an extensive profiling of A-to-I RNA editing in 8,551 human samples (representing 53 body sites from 552 individuals) from the Genotype-Tissue Expression (GTEx) project and in hundreds of other primate and mouse samples. We show that editing levels in non-repetitive coding regions vary more between tissues than editing levels in repetitive regions. Globally, ADAR1 is the primary editor of repetitive sites and ADAR2 is the primary editor of non-repetitive coding sites, whereas the catalytically inactive ADAR3 predominantly acts as an inhibitor of editing. Cross-species analysis of RNA editing in several tissues revealed that species, rather than tissue type, is the primary determinant of editing levels, suggesting stronger cis-directed regulation of RNA editing for most sites, although the small set of conserved coding sites is under stronger trans-regulation. In addition, we curated an extensive set of ADAR1 and ADAR2 targets and showed that many editing sites display distinct tissue-specific regulation by the ADAR enzymes in vivo. Further analysis of the GTEx data revealed several potential regulators of editing, such as AIMP2, which reduces editing in muscles by enhancing the degradation of the ADAR proteins. Collectively, our work provides insights into the complex cis- and trans-regulation of A-to-I editing.
View details for DOI 10.1038/nature24041
View details for PubMedID 29022589
View details for PubMedCentralID PMC5723435
Regulation of gene expression and RNA editing in Drosophila adapting to divergent microclimates.
2017; 8 (1): 1570
Determining the mechanisms by which a species adapts to its environment is a key endeavor in the study of evolution. In particular, relatively little is known about how transcriptional processes are fine-tuned to adjust to different environmental conditions. Here we study Drosophila melanogaster from 'Evolution Canyon' in Israel, which consists of two opposing slopes with divergent microclimates. We identify several hundred differentially expressed genes and dozens of differentially edited sites between flies from each slope, correlate these changes with genetic differences, and use CRISPR mutagenesis to validate that an intronic SNP in prominin regulates its editing levels. We also demonstrate that while temperature affects editing levels at more sites than genetic differences, genetically regulated sites tend to be less affected by temperature. This work shows the extent to which gene expression and RNA editing differ between flies from different microclimates, and provides insights into the regulation responsible for these differences.
View details for DOI 10.1038/s41467-017-01658-2
View details for PubMedID 29146998
View details for PubMedCentralID PMC5691062
RNA editing by ADAR1 prevents MDA5 sensing of endogenous dsRNA as nonself.
2015; 349 (6252): 1115-1120
Adenosine-to-inosine (A-to-I) editing is a highly prevalent posttranscriptional modification of RNA, mediated by ADAR (adenosine deaminase acting on RNA) enzymes. In addition to RNA editing, additional functions have been proposed for ADAR1. To determine the specific role of RNA editing by ADAR1, we generated mice with an editing-deficient knock-in mutation (Adar1(E861A), where E861A denotes Glu(861)→Ala(861)). Adar1(E861A/E861A) embryos died at ~E13.5 (embryonic day 13.5), with activated interferon and double-stranded RNA (dsRNA)-sensing pathways. Genome-wide analysis of the in vivo substrates of ADAR1 identified clustered hyperediting within long dsRNA stem loops within 3' untranslated regions of endogenous transcripts. Finally, embryonic death and phenotypes of Adar1(E861A/E861A) were rescued by concurrent deletion of the cytosolic sensor of dsRNA, MDA5. A-to-I editing of endogenous dsRNA is the essential function of ADAR1, preventing the activation of the cytosolic dsRNA response by endogenous transcripts.
View details for DOI 10.1126/science.aac7049
View details for PubMedID 26275108
Cis Regulatory Effects on A-to-I RNA Editing in Related Drosophila Species
2015; 11 (5): 697-703
Adenosine-to-inosine RNA editing modifies maturing mRNAs through the binding of adenosine deaminase acting on RNA (Adar) proteins to double-stranded RNA structures in a process critical for neuronal function. Editing levels at individual editing sites span a broad range and are mediated by both cis-acting elements (surrounding RNA sequence and secondary structure) and trans-acting factors. Here, we aim to determine the roles that cis-acting elements and trans-acting factors play in regulating editing levels. Using two closely related Drosophila species, D. melanogaster and D. sechellia, and their F1 hybrids, we dissect the effects of cis sequences from trans regulators on editing levels by comparing species-specific editing in parents and their hybrids. We report that cis sequence differences are largely responsible for editing level differences between these two Drosophila species. This study presents evidence for cis sequence and structure changes as the dominant evolutionary force that modulates RNA editing levels between these Drosophila species.
View details for DOI 10.1016/j.celrep.2015.04.005
View details for Web of Science ID 000353902900004
View details for PubMedID 25921533
View details for PubMedCentralID PMC4418222
Genetic mapping uncovers cis-regulatory landscape of RNA editing.
2015; 6: 8194-?
Adenosine-to-inosine (A-to-I) RNA editing, catalysed by ADAR enzymes conserved in metazoans, plays an important role in neurological functions. Although the fine-tuning mechanism provided by A-to-I RNA editing is important, the underlying rules governing ADAR substrate recognition are not well understood. We apply a quantitative trait loci (QTL) mapping approach to identify genetic variants associated with variability in RNA editing. With very accurate measurement of RNA editing levels at 789 sites in 131 Drosophila melanogaster strains, here we identify 545 editing QTLs (edQTLs) associated with differences in RNA editing. We demonstrate that many edQTLs can act through changes in the local secondary structure for edited dsRNAs. Furthermore, we find that edQTLs located outside of the edited dsRNA duplex are enriched in secondary structure, suggesting that distal dsRNA structure beyond the editing site duplex affects RNA editing efficiency. Our work will facilitate the understanding of the cis-regulatory code of RNA editing.
View details for DOI 10.1038/ncomms9194
View details for PubMedID 26373807
View details for PubMedCentralID PMC4573499
Enhanced Specificity and Efficiency of the CRISPR/Cas9 System with Optimized sgRNA Parameters in Drosophila
2014; 9 (3): 1151-1162
The CRISPR/Cas9 system has recently emerged as a powerful tool for functional genomic studies in Drosophila melanogaster. However, single-guide RNA (sgRNA) parameters affecting the specificity and efficiency of the system in flies are still not clear. Here, we found that off-target effects did not occur in regions of genomic DNA with three or more nucleotide mismatches to sgRNAs. Importantly, we document for a strong positive correlation between mutagenesis efficiency and sgRNA GC content of the six protospacer-adjacent motif-proximal nucleotides (PAMPNs). Furthermore, by injecting well-designed sgRNA plasmids at the optimal concentration we determined, we could efficiently generate mutations in four genes in one step. Finally, we generated null alleles of HP1a using optimized parameters through homology-directed repair and achieved an overall mutagenesis rate significantly higher than previously reported. Our work demonstrates a comprehensive optimization of sgRNA and promises to vastly simplify CRISPR/Cas9 experiments in Drosophila.
View details for DOI 10.1016/j.celrep.2014.09.044
View details for Web of Science ID 000344470000034
View details for PubMedCentralID PMC4250831
Quantifying RNA allelic ratios by microfluidic multiplex PCR and sequencing.
2014; 11 (1): 51-54
We developed a targeted RNA sequencing method that couples microfluidics-based multiplex PCR and deep sequencing (mmPCR-seq) to uniformly and simultaneously amplify up to 960 loci in 48 samples independently of their gene expression levels and to accurately and cost-effectively measure allelic ratios even for low-quantity or low-quality RNA samples. We applied mmPCR-seq to RNA editing and allele-specific expression studies. mmPCR-seq complements RNA-seq for studying allelic variations in the transcriptome.
View details for DOI 10.1038/nmeth.2736
View details for PubMedID 24270603
View details for PubMedCentralID PMC3877737
Deciphering the functions and regulation of brain-enriched A-to-I RNA editing.
2013; 16 (11): 1518-1522
Adenosine-to-inosine (A-to-I) RNA editing, in which genomically encoded adenosine is changed to inosine in RNA, is catalyzed by adenosine deaminase acting on RNA (ADAR). This fine-tuning mechanism is critical during normal development and diseases, particularly in relation to brain functions. A-to-I RNA editing has also been hypothesized to be a driving force in human brain evolution. A large number of RNA editing sites have recently been identified, mostly as a result of the development of deep sequencing and bioinformatic analyses. Deciphering the functional consequences of RNA editing events is challenging, but emerging genome engineering approaches may expedite new discoveries. To understand how RNA editing is dynamically regulated, it is imperative to construct a spatiotemporal atlas at the species, tissue and cell levels. Future studies will need to identify the cis and trans regulatory factors that drive the selectivity and frequency of RNA editing. We anticipate that recent technological advancements will aid researchers in acquiring a much deeper understanding of the functions and regulation of RNA editing.
View details for DOI 10.1038/nn.3539
View details for PubMedID 24165678
Reliable Identification of Genomic Variants from RNA-Seq Data.
American journal of human genetics
2013; 93 (4): 641-651
Identifying genomic variation is a crucial step for unraveling the relationship between genotype and phenotype and can yield important insights into human diseases. Prevailing methods rely on cost-intensive whole-genome sequencing (WGS) or whole-exome sequencing (WES) approaches while the identification of genomic variants from often existing RNA sequencing (RNA-seq) data remains a challenge because of the intrinsic complexity in the transcriptome. Here, we present a highly accurate approach termed SNPiR to identify SNPs in RNA-seq data. We applied SNPiR to RNA-seq data of samples for which WGS and WES data are also available and achieved high specificity and sensitivity. Of the SNPs called from the RNA-seq data, >98% were also identified by WGS or WES. Over 70% of all expressed coding variants were identified from RNA-seq, and comparable numbers of exonic variants were identified in RNA-seq and WES. Despite our method's limitation in detecting variants in expressed regions only, our results demonstrate that SNPiR outperforms current state-of-the-art approaches for variant detection from RNA-seq data and offers a cost-effective and reliable alternative for SNP discovery.
View details for DOI 10.1016/j.ajhg.2013.08.008
View details for PubMedID 24075185
Identifying RNA editing sites using RNA sequencing data alone
2013; 10 (2): 128-132
We show that RNA editing sites can be called with high confidence using RNA sequencing data from multiple samples across either individuals or species, without the need for matched genomic DNA sequence. We identified many previously unidentified editing sites in both humans and Drosophila; our results nearly double the known number of human protein recoding events. We also found that human genes harboring conserved editing sites within Alu repeats are enriched for neuronal functions.
View details for DOI 10.1038/NMETH.2330
View details for Web of Science ID 000314623900018
View details for PubMedID 23291724
View details for PubMedCentralID PMC3676881
RNA sequencing reveals a diverse and dynamic repertoire of the Xenopus tropicalis transcriptome over development
2013; 23 (1): 201-216
The Xenopus embryo has provided key insights into fate specification, the cell cycle, and other fundamental developmental and cellular processes, yet a comprehensive understanding of its transcriptome is lacking. Here, we used paired end RNA sequencing (RNA-seq) to explore the transcriptome of Xenopus tropicalis in 23 distinct developmental stages. We determined expression levels of all genes annotated in RefSeq and Ensembl and showed for the first time on a genome-wide scale that, despite a general state of transcriptional silence in the earliest stages of development, approximately 150 genes are transcribed prior to the midblastula transition. In addition, our splicing analysis uncovered more than 10,000 novel splice junctions at each stage and revealed that many known genes have additional unannotated isoforms. Furthermore, we used Cufflinks to reconstruct transcripts from our RNA-seq data and found that ∼13.5% of the final contigs are derived from novel transcribed regions, both within introns and in intergenic regions. We then developed a filtering pipeline to separate protein-coding transcripts from noncoding RNAs and identified a confident set of 6686 noncoding transcripts in 3859 genomic loci. Since the current reference genome, XenTro3, consists of hundreds of scaffolds instead of full chromosomes, we also performed de novo reconstruction of the transcriptome using Trinity and uncovered hundreds of transcripts that are missing from the genome. Collectively, our data will not only aid in completing the assembly of the Xenopus tropicalis genome but will also serve as a valuable resource for gene discovery and for unraveling the fundamental mechanisms of vertebrate embryogenesis.
View details for DOI 10.1101/gr.141424.112
View details for Web of Science ID 000312963400019
View details for PubMedID 22960373
View details for PubMedCentralID PMC3530680
Accurate identification of human Alu and non-Alu RNA editing sites
2012; 9 (6): 579-?
We developed a computational framework to robustly identify RNA editing sites using transcriptome and genome deep-sequencing data from the same individual. As compared with previous methods, our approach identified a large number of Alu and non-Alu RNA editing sites with high specificity. We also found that editing of non-Alu sites appears to be dependent on nearby edited Alu sites, possibly through the locally formed double-stranded RNA structure.
View details for DOI 10.1038/NMETH.1982
View details for Web of Science ID 000304778500021
View details for PubMedID 22484847
View details for PubMedCentralID PMC3662811
Comment on "Widespread RNA and DNA Sequence Differences in the Human Transcriptome"
2012; 335 (6074)
Li et al. (Research Articles, 1 July 2011, p. 53; published online 19 May 2011) reported widespread differences between the RNA and DNA sequences of the same human cells, including all 12 possible mismatch types. Before accepting such a fundamental claim, a deeper analysis of the sequencing data is required to discern true differences between RNA and DNA from potential artifacts.
View details for DOI 10.1126/science.1210624
View details for Web of Science ID 000301531600026
View details for PubMedID 22422964
Genome-Wide Identification of Human RNA Editing Sites by Parallel DNA Capturing and Sequencing
2009; 324 (5931): 1210-1213
Adenosine-to-inosine (A-to-I) RNA editing leads to transcriptome diversity and is important for normal brain function. To date, only a handful of functional sites have been identified in mammals. We developed an unbiased assay to screen more than 36,000 computationally predicted nonrepetitive A-to-I sites using massively parallel target capture and DNA sequencing. A comprehensive set of several hundred human RNA editing sites was detected by comparing genomic DNA with RNAs from seven tissues of a single individual. Specificity of our profiling was supported by observations of enrichment with known features of targets of adenosine deaminases acting on RNA (ADAR) and validation by means of capillary sequencing. This efficient approach greatly expands the repertoire of RNA editing targets and can be applied to studies involving RNA editing-related human diseases.
View details for DOI 10.1126/science.1170995
View details for Web of Science ID 000266410100049
View details for PubMedID 19478186
Comparative and basal genomics identifies a flagellar and basal body proteome that includes the BBS5 human disease gene
2004; 117 (4): 541-552
Cilia and flagella are microtubule-based structures nucleated by modified centrioles termed basal bodies. These biochemically complex organelles have more than 250 and 150 polypeptides, respectively. To identify the proteins involved in ciliary and basal body biogenesis and function, we undertook a comparative genomics approach that subtracted the nonflagellated proteome of Arabidopsis from the shared proteome of the ciliated/flagellated organisms Chlamydomonas and human. We identified 688 genes that are present exclusively in organisms with flagella and basal bodies and validated these data through a series of in silico, in vitro, and in vivo studies. We then applied this resource to the study of human ciliation disorders and have identified BBS5, a novel gene for Bardet-Biedl syndrome. We show that this novel protein localizes to basal bodies in mouse and C. elegans, is under the regulatory control of daf-19, and is necessary for the generation of both cilia and flagella.
View details for Web of Science ID 000221458000013
View details for PubMedID 15137946
- Updates to the RNA mapping database (RMDB), version 2 NUCLEIC ACIDS RESEARCH 2018; 46 (D1): D375–D379
A-to-I RNA editing in the rat brain is age-dependent, region-specific and sensitive to environmental stress across generations.
2018; 19 (1): 28
Adenosine-to-inosine (A-to-I) RNA editing is an epigenetic modification catalyzed by adenosine deaminases acting on RNA (ADARs), and is especially prevalent in the brain. We used the highly accurate microfluidics-based multiplex PCR sequencing (mmPCR-seq) technique to assess the effects of development and environmental stress on A-to-I editing at 146 pre-selected, conserved sites in the rat prefrontal cortex and amygdala. Furthermore, we asked whether changes in editing can be observed in offspring of stress-exposed rats. In parallel, we assessed changes in ADARs expression levels.In agreement with previous studies, we found editing to be generally higher in adult compared to neonatal rat brain. At birth, editing was generally lower in prefrontal cortex than in amygdala. Stress affected editing at the serotonin receptor 2c (Htr2c), and editing at this site was significantly altered in offspring of rats exposed to prereproductive stress across two generations. Stress-induced changes in Htr2c editing measured with mmPCR-seq were comparable to changes measured with Sanger and Illumina sequencing. Developmental and stress-induced changes in Adar and Adarb1 mRNA expression were observed but did not correlate with editing changes.Our findings indicate that mmPCR-seq can accurately detect A-to-I RNA editing in rat brain samples, and confirm previous accounts of a developmental increase in RNA editing rates. Our findings also point to stress in adolescence as an environmental factor that alters RNA editing patterns several generations forward, joining a growing body of literature describing the transgenerational effects of stress.
View details for DOI 10.1186/s12864-017-4409-8
View details for PubMedID 29310578
View details for PubMedCentralID PMC5759210
- Abnormalities in A-to-I RNA editing patterns in CNS injuries correlate with dynamic changes in cell type composition SCIENTIFIC REPORTS 2017; 7
Molecular definition of a metastatic lung cancer state reveals a targetable CD109-Janus kinase-Stat axis.
2017; 23 (3): 291-300
Lung cancer is the leading cause of cancer deaths worldwide, with the majority of mortality resulting from metastatic spread. However, the molecular mechanism by which cancer cells acquire the ability to disseminate from primary tumors, seed distant organs, and grow into tissue-destructive metastases remains incompletely understood. We combined tumor barcoding in a mouse model of human lung adenocarcinoma with unbiased genomic approaches to identify a transcriptional program that confers metastatic ability and predicts patient survival. Small-scale in vivo screening identified several genes, including Cd109, that encode novel pro-metastatic factors. We uncovered signaling mediated by Janus kinases (Jaks) and the transcription factor Stat3 as a critical, pharmacologically targetable effector of CD109-driven lung cancer metastasis. In summary, by coupling the systematic genomic analysis of purified cancer cells in distinct malignant states from mouse models with extensive human validation, we uncovered several key regulators of metastatic ability, including an actionable pro-metastatic CD109-Jak-Stat3 axis.
View details for DOI 10.1038/nm.4285
View details for PubMedID 28191885
Deficiency of microRNA miR-34a expands cell fate potential in pluripotent stem cells
2017; 355 (6325): 596-?
Embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs) efficiently generate all embryonic cell lineages but rarely generate extraembryonic cell types. We found that microRNA miR-34a deficiency expands the developmental potential of mouse pluripotent stem cells, yielding both embryonic and extraembryonic lineages and strongly inducing MuERV-L (MERVL) endogenous retroviruses, similar to what is seen with features of totipotent two-cell blastomeres. miR-34a restricts the acquisition of expanded cell fate potential in pluripotent stem cells, and it represses MERVL expression through transcriptional regulation, at least in part by targeting the transcription factor Gata2. Our studies reveal a complex molecular network that defines and restricts pluripotent developmental potential in cultured ESCs and iPSCs.
View details for DOI 10.1126/science.aag1927
View details for Web of Science ID 000393636700040
DDX6 Represses Aberrant Activation of Interferon-Stimulated Genes.
2017; 20 (4): 819–31
The innate immune system tightly regulates activation of interferon-stimulated genes (ISGs) to avoid inappropriate expression. Pathological ISG activation resulting from aberrant nucleic acid metabolism has been implicated in autoimmune disease; however, the mechanisms governing ISG suppression are unknown. Through a genome-wide genetic screen, we identified DEAD-box helicase 6 (DDX6) as a suppressor of ISGs. Genetic ablation of DDX6 induced global upregulation of ISGs and other immune genes. ISG upregulation proved cell intrinsic, imposing an antiviral state and making cells refractory to divergent families of RNA viruses. Epistatic analysis revealed that ISG activation could not be overcome by deletion of canonical RNA sensors. However, DDX6 deficiency was suppressed by disrupting LSM1, a core component of mRNA degradation machinery, suggesting that dysregulation of RNA processing underlies ISG activation in the DDX6 mutant. DDX6 is distinct among DExD/H helicases that regulate the antiviral response in its singular ability to negatively regulate immunity.
View details for DOI 10.1016/j.celrep.2017.06.085
View details for PubMedID 28746868
View details for PubMedCentralID PMC5551412
Landscape of X chromosome inactivation across human tissues.
2017; 550 (7675): 244–48
X chromosome inactivation (XCI) silences transcription from one of the two X chromosomes in female mammalian cells to balance expression dosage between XX females and XY males. XCI is, however, incomplete in humans: up to one-third of X-chromosomal genes are expressed from both the active and inactive X chromosomes (Xa and Xi, respectively) in female cells, with the degree of 'escape' from inactivation varying between genes and individuals. The extent to which XCI is shared between cells and tissues remains poorly characterized, as does the degree to which incomplete XCI manifests as detectable sex differences in gene expression and phenotypic traits. Here we describe a systematic survey of XCI, integrating over 5,500 transcriptomes from 449 individuals spanning 29 tissues from GTEx (v6p release) and 940 single-cell transcriptomes, combined with genomic sequence data. We show that XCI at 683 X-chromosomal genes is generally uniform across human tissues, but identify examples of heterogeneity between tissues, individuals and cells. We show that incomplete XCI affects at least 23% of X-chromosomal genes, identify seven genes that escape XCI with support from multiple lines of evidence and demonstrate that escape from XCI results in sex biases in gene expression, establishing incomplete XCI as a mechanism that is likely to introduce phenotypic diversity. Overall, this updated catalogue of XCI across human tissues helps to increase our understanding of the extent and impact of the incompleteness in the maintenance of XCI.
View details for DOI 10.1038/nature24265
View details for PubMedID 29022598
View details for PubMedCentralID PMC5685192
The evolution and adaptation of A-to-I RNA editing.
2017; 13 (11): e1007064
Adenosine-to-inosine (A-to-I) RNA editing is an important post-transcriptional modification that affects the information encoded from DNA to RNA to protein. RNA editing can generate a multitude of transcript isoforms and can potentially be used to optimize protein function in response to varying conditions. In light of this and the fact that millions of editing sites have been identified in many different species, it is interesting to examine the extent to which these sites have evolved to be functionally important. In this review, we discuss results pertaining to the evolution of RNA editing, specifically in humans, cephalopods, and Drosophila. We focus on how comparative genomics approaches have aided in the identification of sites that are likely to be advantageous. The use of RNA editing as a mechanism to adapt to varying environmental conditions will also be reviewed.
View details for DOI 10.1371/journal.pgen.1007064
View details for PubMedID 29182635
View details for PubMedCentralID PMC5705066
Rewriting the transcriptome: adenosine-to-inosine RNA editing by ADARs.
2017; 18 (1): 205
One of the most prevalent forms of post-transcritpional RNA modification is the conversion of adenosine nucleosides to inosine (A-to-I), mediated by the ADAR family of enzymes. The functional requirement and regulatory landscape for the majority of A-to-I editing events are, at present, uncertain. Recent studies have identified key in vivo functions of ADAR enzymes, informing our understanding of the biological importance of A-to-I editing. Large-scale studies have revealed how editing is regulated both in cis and in trans. This review will explore these recent studies and how they broaden our understanding of the functions and regulation of ADAR-mediated RNA editing.
View details for DOI 10.1186/s13059-017-1347-3
View details for PubMedID 29084589
View details for PubMedCentralID PMC5663115
Protein recoding by ADAR1-mediated RNA editing is not essential for normal development and homeostasis.
2017; 18 (1): 166
Adenosine-to-inosine (A-to-I) editing of dsRNA by ADAR proteins is a pervasive epitranscriptome feature. Tens of thousands of A-to-I editing events are defined in the mouse, yet the functional impact of most is unknown. Editing causing protein recoding is the essential function of ADAR2, but an essential role for recoding by ADAR1 has not been demonstrated. ADAR1 has been proposed to have editing-dependent and editing-independent functions. The relative contribution of these in vivo has not been clearly defined. A critical function of ADAR1 is editing of endogenous RNA to prevent activation of the dsRNA sensor MDA5 (Ifih1). Outside of this, how ADAR1 editing contributes to normal development and homeostasis is uncertain.We describe the consequences of ADAR1 editing deficiency on murine homeostasis. Adar1 E861A/E861A Ifih1 -/- mice are strikingly normal, including their lifespan. There is a mild, non-pathogenic innate immune activation signature in the Adar1 E861A/E861A Ifih1 -/- mice. Assessing A-to-I editing across adult tissues demonstrates that outside of the brain, ADAR1 performs the majority of editing and that ADAR2 cannot compensate in its absence. Direct comparison of the Adar1 -/- and Adar1 E861A/E861A alleles demonstrates a high degree of concordance on both Ifih1 +/+ and Ifih1 -/- backgrounds, suggesting no substantial contribution from ADAR1 editing-independent functions.These analyses demonstrate that the lifetime absence of ADAR1-editing is well tolerated in the absence of MDA5. We conclude that protein recoding arising from ADAR1-mediated editing is not essential for organismal homeostasis. Additionally, the phenotypes associated with loss of ADAR1 are the result of RNA editing and MDA5-dependent functions.
View details for DOI 10.1186/s13059-017-1301-4
View details for PubMedID 28874170
View details for PubMedCentralID PMC5585977
Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease.
2017; 49 (12): 1664–70
Genetic variants have been associated with myriad molecular phenotypes that provide new insight into the range of mechanisms underlying genetic traits and diseases. Identifying any particular genetic variant's cascade of effects, from molecule to individual, requires assaying multiple layers of molecular complexity. We introduce the Enhancing GTEx (eGTEx) project that extends the GTEx project to combine gene expression with additional intermediate molecular measurements on the same tissues to provide a resource for studying how genetic differences cascade through molecular phenotypes to impact human health.
View details for DOI 10.1038/ng.3969
View details for PubMedID 29019975
Co-expression networks reveal the tissue-specific regulation of transcription and splicing.
2017; 27 (11): 1843–58
Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues.
View details for DOI 10.1101/gr.216721.116
View details for PubMedID 29021288
View details for PubMedCentralID PMC5668942
Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis.
2017; 27 (11): 1859–71
The impact of inherited genetic variation on gene expression in humans is well-established. The majority of known expression quantitative trait loci (eQTLs) impact expression of local genes (cis-eQTLs). More research is needed to identify effects of genetic variation on distant genes (trans-eQTLs) and understand their biological mechanisms. One common trans-eQTLs mechanism is "mediation" by a local (cis) transcript. Thus, mediation analysis can be applied to genome-wide SNP and expression data in order to identify transcripts that are "cis-mediators" of trans-eQTLs, including those "cis-hubs" involved in regulation of many trans-genes. Identifying such mediators helps us understand regulatory networks and suggests biological mechanisms underlying trans-eQTLs, both of which are relevant for understanding susceptibility to complex diseases. The multitissue expression data from the Genotype-Tissue Expression (GTEx) program provides a unique opportunity to study cis-mediation across human tissue types. However, the presence of complex hidden confounding effects in biological systems can make mediation analyses challenging and prone to confounding bias, particularly when conducted among diverse samples. To address this problem, we propose a new method: Genomic Mediation analysis with Adaptive Confounding adjustment (GMAC). It enables the search of a very large pool of variables, and adaptively selects potential confounding variables for each mediation test. Analyses of simulated data and GTEx data demonstrate that the adaptive selection of confounders by GMAC improves the power and precision of mediation analysis. Application of GMAC to GTEx data provides new insights into the observed patterns of cis-hubs and trans-eQTL regulation across tissue types.
View details for DOI 10.1101/gr.216754.116
View details for PubMedID 29021290
View details for PubMedCentralID PMC5668943
The impact of rare variation on gene expression across tissues.
2017; 550 (7675): 239–43
Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
View details for DOI 10.1038/nature24267
View details for PubMedID 29022581
Genetic effects on gene expression across human tissues.
2017; 550 (7675): 204–13
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.
View details for DOI 10.1038/nature24277
View details for PubMedID 29022597
Adenosine-to-inosine RNA editing by ADAR1 is essential for normal murine erythropoiesis
2016; 44 (10): 947-963
Adenosine deaminases that act on RNA (ADARs) convert adenosine residues to inosine in double-stranded RNA. In vivo, ADAR1 is essential for the maintenance of hematopoietic stem/progenitors. Whether other hematopoietic cell types also require ADAR1 has not been assessed. Using erythroid- and myeloid-restricted deletion of Adar1, we demonstrate that ADAR1 is dispensable for myelopoiesis but is essential for normal erythropoiesis. Adar1-deficient erythroid cells display a profound activation of innate immune signaling and high levels of cell death. No changes in microRNA levels were found in ADAR1-deficient erythroid cells. Using an editing-deficient allele, we demonstrate that RNA editing is the essential function of ADAR1 during erythropoiesis. Mapping of adenosine-to-inosine editing in purified erythroid cells identified clusters of hyperedited adenosines located in long 3'-untranslated regions of erythroid-specific transcripts and these are ADAR1-specific editing events. ADAR1-mediated RNA editing is essential for normal erythropoiesis.
View details for DOI 10.1016/j.exphem.2016.06.250
View details for Web of Science ID 000384276100010
View details for PubMedID 27373493
Identification of human RNA editing sites: A historical perspective.
Methods (San Diego, Calif.)
2016; 107: 42-47
A-to-I RNA editing is an essential gene regulatory mechanism. Once thought to be a rare phenomenon only occurring in a few transcripts, the emergence of high-throughput RNA sequencing has facilitated the identification of over 2 million RNA editing sites in the human transcriptome. In this review, we survey the current RNA-seq based methods as well as historical methods used to identify RNA editing sites.
View details for DOI 10.1016/j.ymeth.2016.05.011
View details for PubMedID 27208508
XenMine: A genomic interaction tool for the Xenopus community.
The Xenopus community has embraced recent advances in sequencing technology, resulting in the accumulation of numerous RNA-Seq and ChIP-Seq datasets. However, easily accessing and comparing datasets generated by multiple laboratories is challenging. Thus, we have created a central space to view, search and analyze data, providing essential information on gene expression changes and regulatory elements present in the genome. XenMine (www.xenmine.org) is a user-friendly website containing published genomic datasets from both Xenopus tropicalis and Xenopus laevis. We have established an analysis pipeline where all published datasets are uniformly processed with the latest genome releases. Information from these datasets can be extracted and compared using an array of pre-built or custom templates. With these search tools, users can easily extract sequences for all putative regulatory domains surrounding a gene of interest, identify the expression values of a gene of interest over developmental time, and analyze lists of genes for gene ontology terms and publications. Additionally, XenMine hosts an in-house genome browser that allows users to visualize all available ChIP-Seq data, extract specifically marked sequences, and aid in identifying important regulatory elements within the genome. Altogether, XenMine is an excellent tool for visualizing, accessing and querying analyzed datasets rapidly and efficiently.
View details for DOI 10.1016/j.ydbio.2016.02.034
View details for PubMedID 27157655
- Editing of Cellular Self-RNAs by Adenosine Deaminase ADAR1 Suppresses Innate Immune Stress Responses JOURNAL OF BIOLOGICAL CHEMISTRY 2016; 291 (12): 6158-6168
- The Genomic Landscape and Clinical Relevance of A-to-I RNA Editing in Human Cancers CANCER CELL 2015; 28 (4)
The landscape of genomic imprinting across diverse adult human tissues
2015; 25 (7): 927-936
Genomic imprinting is an important regulatory mechanism that silences one of the parental copies of a gene. To systematically characterize this phenomenon, we analyze tissue specificity of imprinting from allelic expression data in 1582 primary tissue samples from 178 individuals from the Genotype-Tissue Expression (GTEx) project. We characterize imprinting in 42 genes, including both novel and previously identified genes. Tissue specificity of imprinting is widespread, and gender-specific effects are revealed in a small number of genes in muscle with stronger imprinting in males. IGF2 shows maternal expression in the brain instead of the canonical paternal expression elsewhere. Imprinting appears to have only a subtle impact on tissue-specific expression levels, with genes lacking a systematic expression difference between tissues with imprinted and biallelic expression. In summary, our systematic characterization of imprinting in adult tissues highlights variation in imprinting between genes, individuals, and tissues.
View details for DOI 10.1101/gr.192278.115
View details for Web of Science ID 000357356900001
View details for PubMedID 25953952
View details for PubMedCentralID PMC4484390
- Effect of predicted protein-truncating genetic variants on the human transcriptome SCIENCE 2015; 348 (6235): 666-669
Genetic conflict reflected in tissue-specific maps of genomic imprinting in human and mouse.
2015; 47 (5): 544-549
Genomic imprinting is an epigenetic process that restricts gene expression to either the maternally or paternally inherited allele. Many theories have been proposed to explain its evolutionary origin, but understanding has been limited by a paucity of data mapping the breadth and dynamics of imprinting within any organism. We generated an atlas of imprinting spanning 33 mouse and 45 human developmental stages and tissues. Nearly all imprinted genes were imprinted in early development and either retained their parent-of-origin expression in adults or lost it completely. Consistent with an evolutionary signature of parental conflict, imprinted genes were enriched for coexpressed pairs of maternally and paternally expressed genes, showed accelerated expression divergence between human and mouse, and were more highly expressed than their non-imprinted orthologs in other species. Our approach demonstrates a general framework for the discovery of imprinting in any species and sheds light on the causes and consequences of genomic imprinting in mammals.
View details for DOI 10.1038/ng.3274
View details for PubMedID 25848752
View details for PubMedCentralID PMC4414907
The role of Abcb5 alleles in susceptibility to haloperidol-induced toxicity in mice and humans.
2015; 12 (2)
We know very little about the genetic factors affecting susceptibility to drug-induced central nervous system (CNS) toxicities, and this has limited our ability to optimally utilize existing drugs or to develop new drugs for CNS disorders. For example, haloperidol is a potent dopamine antagonist that is used to treat psychotic disorders, but 50% of treated patients develop characteristic extrapyramidal symptoms caused by haloperidol-induced toxicity (HIT), which limits its clinical utility. We do not have any information about the genetic factors affecting this drug-induced toxicity. HIT in humans is directly mirrored in a murine genetic model, where inbred mouse strains are differentially susceptible to HIT. Therefore, we genetically analyzed this murine model and performed a translational human genetic association study.A whole genome SNP database and computational genetic mapping were used to analyze the murine genetic model of HIT. Guided by the mouse genetic analysis, we demonstrate that genetic variation within an ABC-drug efflux transporter (Abcb5) affected susceptibility to HIT. In situ hybridization results reveal that Abcb5 is expressed in brain capillaries, and by cerebellar Purkinje cells. We also analyzed chromosome substitution strains, imaged haloperidol abundance in brain tissue sections and directly measured haloperidol (and its metabolite) levels in brain, and characterized Abcb5 knockout mice. Our results demonstrate that Abcb5 is part of the blood-brain barrier; it affects susceptibility to HIT by altering the brain concentration of haloperidol. Moreover, a genetic association study in a haloperidol-treated human cohort indicates that human ABCB5 alleles had a time-dependent effect on susceptibility to individual and combined measures of HIT. Abcb5 alleles are pharmacogenetic factors that affect susceptibility to HIT, but it is likely that additional pharmacogenetic susceptibility factors will be discovered.ABCB5 alleles alter susceptibility to HIT in mouse and humans. This discovery leads to a new model that (at least in part) explains inter-individual differences in susceptibility to a drug-induced CNS toxicity.
View details for DOI 10.1371/journal.pmed.1001782
View details for PubMedID 25647612
View details for PubMedCentralID PMC4315575
- Novel RNA Modifications in the Nervous System: Form and Function JOURNAL OF NEUROSCIENCE 2014; 34 (46): 15170-15177
- Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues PLOS GENETICS 2014; 10 (5)
A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes
2014; 24 (3): 365-376
RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine (A-to-I) RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most RNA editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon and its effect on transcriptome diversity are not yet clear. Here, we analyzed large-scale RNA-seq data and detected ∼1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultradeep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu repeats that form double-stranded RNA undergo A-to-I editing, although most sites exhibit editing at only low levels (<1%). Moreover, using high coverage sequencing, we observed editing of transcripts resulting from residual antisense expression, doubling the number of edited sites in the human genome. Based on bioinformatic analyses and deep targeted sequencing, we estimate that there are over 100 million human Alu RNA editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.
View details for DOI 10.1101/gr.164749.113
View details for Web of Science ID 000332246100001
View details for PubMedID 24347612
View details for PubMedCentralID PMC3941102
- RADAR: a rigorously annotated database of A-to-I RNA editing NUCLEIC ACIDS RESEARCH 2014; 42 (D1): D109-D113
Comparative RNA editing in autistic and neurotypical cerebella
2013; 18 (9): 1041-1048
Adenosine-to-inosine (A-to-I) RNA editing is a neurodevelopmentally regulated epigenetic modification shown to modulate complex behavior in animals. Little is known about human A-to-I editing, but it is thought to constitute one of many molecular mechanisms connecting environmental stimuli and behavioral outputs. Thus, comprehensive exploration of A-to-I RNA editing in human brains may shed light on gene-environment interactions underlying complex behavior in health and disease. Synaptic function is a main target of A-to-I editing, which can selectively recode key amino acids in synaptic genes, directly altering synaptic strength and duration in response to environmental signals. Here, we performed a high-resolution survey of synaptic A-to-I RNA editing in a human population, and examined how it varies in autism, a neurodevelopmental disorder in which synaptic abnormalities are a common finding. Using ultra-deep (>1000 × ) sequencing, we quantified the levels of A-to-I editing of 10 synaptic genes in postmortem cerebella from 14 neurotypical and 11 autistic individuals. A high dynamic range of editing levels was detected across individuals and editing sites, from 99.6% to below detection limits. In most sites, the extreme ends of the population editing distributions were individuals with autism. Editing was correlated with isoform usage, clusters of correlated sites were identified, and differential editing patterns examined. Finally, a dysfunctional form of the editing enzyme adenosine deaminase acting on RNA B1 was found more commonly in postmortem cerebella from individuals with autism. These results provide a population-level, high-resolution view of A-to-I RNA editing in human cerebella and suggest that A-to-I editing of synaptic genes may be informative for assessing the epigenetic risk for autism.Molecular Psychiatry advance online publication, 7 August 2012; doi:10.1038/mp.2012.118.
View details for DOI 10.1038/mp.2012.118
View details for Web of Science ID 000323595300015
View details for PubMedID 22869036
View details for PubMedCentralID PMC3494744
- Lack of evidence for existence of noncanonical RNA editing NATURE BIOTECHNOLOGY 2013; 31 (1): 19-20
Versatile design and synthesis platform for visualizing genomes with Oligopaint FISH probes
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2012; 109 (52): 21301-21306
A host of observations demonstrating the relationship between nuclear architecture and processes such as gene expression have led to a number of new technologies for interrogating chromosome positioning. Whereas some of these technologies reconstruct intermolecular interactions, others have enhanced our ability to visualize chromosomes in situ. Here, we describe an oligonucleotide- and PCR-based strategy for fluorescence in situ hybridization (FISH) and a bioinformatic platform that enables this technology to be extended to any organism whose genome has been sequenced. The oligonucleotide probes are renewable, highly efficient, and able to robustly label chromosomes in cell culture, fixed tissues, and metaphase spreads. Our method gives researchers precise control over the sequences they target and allows for single and multicolor imaging of regions ranging from tens of kilobases to megabases with the same basic protocol. We anticipate this technology will lead to an enhanced ability to visualize interphase and metaphase chromosomes.
View details for DOI 10.1073/pnas.1213818110
View details for Web of Science ID 000313627700041
View details for PubMedID 23236188
- The difficult calls in RNA editing NATURE BIOTECHNOLOGY 2012; 30 (12): 1207-1209
Activity-Dependent A-to-I RNA Editing in Rat Cortical Neurons
2012; 192 (1): 281-U569
Changes in neural activity influence synaptic plasticity/scaling, gene expression, and epigenetic modifications. We present the first evidence that short-term and persistent changes in neural activity can alter adenosine-to-inosine (A-to-I) RNA editing, a post-transcriptional site-specific modification found in several neuron-specific transcripts. In rat cortical neuron cultures, activity-dependent changes in A-to-I RNA editing in coding exons are present after 6 hr of high potassium depolarization but not after 1 hr and require calcium entry into neurons. When treatments are extended from hours to days, we observe a negative feedback phenomenon: Chronic depolarization increases editing at many sites and chronic silencing decreases editing. We present several different modulations of neural activity that change the expression of different mRNA isoforms through editing.
View details for DOI 10.1534/genetics.112.141200
View details for Web of Science ID 000309001800021
View details for PubMedID 22714409
A public resource facilitating clinical use of genomes
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2012; 109 (30): 11920-11927
Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved "open consent" process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain-we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.
View details for DOI 10.1073/pnas.1201904109
View details for Web of Science ID 000306992700018
View details for PubMedID 22797899
View details for PubMedCentralID PMC3409785
Scalable gene synthesis by selective amplification of DNA pools from high-fidelity microchips
2010; 28 (12): 1295-U108
Development of cheap, high-throughput and reliable gene synthesis methods will broadly stimulate progress in biology and biotechnology. Currently, the reliance on column-synthesized oligonucleotides as a source of DNA limits further cost reductions in gene synthesis. Oligonucleotides from DNA microchips can reduce costs by at least an order of magnitude, yet efforts to scale their use have been largely unsuccessful owing to the high error rates and complexity of the oligonucleotide mixtures. Here we use high-fidelity DNA microchips, selective oligonucleotide pool amplification, optimized gene assembly protocols and enzymatic error correction to develop a method for highly parallel gene synthesis. We tested our approach by assembling 47 genes, including 42 challenging therapeutic antibody sequences, encoding a total of ∼35 kilobase pairs of DNA. These assemblies were performed from a complex background containing 13,000 oligonucleotides encoding ∼2.5 megabases of DNA, which is at least 50 times larger than in previously published attempts.
View details for DOI 10.1038/nbt.1716
View details for Web of Science ID 000285088400024
View details for PubMedID 21113165
Sequence based identification of RNA editing sites
2010; 7 (2): 248-252
RNA editing diversifies the human transcriptome beyond the genomic repertoire. Recent years have proven a strategy based on genomics and computational sequence analysis as a powerful tool for identification and characterization of RNA editing. In particular, analysis of the human transcriptome has resulted in the identification of thousands of A-to-I editing sites within genomic repeats, as well as a few hundred sites located outside repeats. We review these recent advancements, emphasizing the principles underlying the various methods used. Possible directions for extending these methods are discussed.
View details for Web of Science ID 000282761300020
View details for PubMedID 20215866
A Robust Approach to Identifying Tissue-Specific Gene Expression Regulatory Variants Using Personalized Human Induced Pluripotent Stem Cells
2009; 5 (11)
Normal variation in gene expression due to regulatory polymorphisms is often masked by biological and experimental noise. In addition, some regulatory polymorphisms may become apparent only in specific tissues. We derived human induced pluripotent stem (iPS) cells from adult skin primary fibroblasts and attempted to detect tissue-specific cis-regulatory variants using in vitro cell differentiation. We used padlock probes and high-throughput sequencing for digital RNA allelotyping and measured allele-specific gene expression in primary fibroblasts, lymphoblastoid cells, iPS cells, and their differentiated derivatives. We show that allele-specific expression is both cell type and genotype-dependent, but the majority of detectable allele-specific expression loci remains consistent despite large changes in the cell type or the experimental condition following iPS reprogramming, except on the X-chromosome. We show that our approach to mapping cis-regulatory variants reduces in vitro experimental noise and reveals additional tissue-specific variants using skin-derived human iPS cells.
View details for DOI 10.1371/journal.pgen.1000718
View details for Web of Science ID 000272419500014
View details for PubMedID 19911041
Multiplex padlock targeted sequencing reveals human hypermutable CpG variations
2009; 19 (9): 1606-1615
Utilizing the full power of next-generation sequencing often requires the ability to perform large-scale multiplex enrichment of many specific genomic loci in multiple samples. Several technologies have been recently developed but await substantial improvements. We report the 10,000-fold improvement of a previously developed padlock-based approach, and apply the assay to identifying genetic variations in hypermutable CpG regions across human chromosome 21. From approximately 3 million reads derived from a single Illumina Genome Analyzer lane, approximately 94% (approximately 50,500) target sites can be observed with at least one read. The uniformity of coverage was also greatly improved; up to 93% and 57% of all targets fell within a 100- and 10-fold coverage range, respectively. Alleles at >400,000 target base positions were determined across six subjects and examined for single nucleotide polymorphisms (SNPs), and the concordance with independently obtained genotypes was 98.4%-100%. We detected >500 SNPs not currently in dbSNP, 362 of which were in targeted CpG locations. Transitions in CpG sites were at least 13.7 times more abundant than non-CpG transitions. Fractions of polymorphic CpG sites are lower in CpG-rich regions and show higher correlation with human-chimpanzee divergence within CpG versus non-CpG sites. This is consistent with the hypothesis that methylation rate heterogeneity along chromosomes contributes to mutation rate variation in humans. Our success suggests that targeted CpG resequencing is an efficient way to identify common and rare genetic variations. In addition, the significantly improved padlock capture technology can be readily applied to other projects that require multiplex sample preparation.
View details for DOI 10.1101/gr.092213.109
View details for Web of Science ID 000269482200011
View details for PubMedID 19525355
Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human
2009; 6 (8): 613-U90
We developed a digital RNA allelotyping method for quantitatively interrogating allele-specific gene expression. This method involves ultra-deep sequencing of padlock-captured single-nucleotide polymorphisms (SNPs) from the transcriptome. We characterized four cell lines established from two human subjects in the Personal Genome Project. Approximately 11-22% of the heterozygous mRNA-associated SNPs showed allele-specific expression in each cell line and 4.3-8.5% were tissue-specific, suggesting the presence of tissue-specific cis regulation. When we applied allelotyping to two pairs of sibling human embryonic stem cell lines, the sibling lines were more similar in allele-specific expression than were the genetically unrelated lines. We found that the variation of allelic ratios in gene expression among different cell lines was primarily explained by genetic variations, much more so than by specific tissue types or growth conditions. Comparison of expressed SNPs on the sense and antisense transcripts suggested that allelic ratios are primarily determined by cis-regulatory mechanisms on the sense transcripts.
View details for DOI 10.1038/nmeth.1357
View details for Web of Science ID 000268493700024
View details for PubMedID 19620972
Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells
2009; 27 (4): 361-368
Studies of epigenetic modifications would benefit from improved methods for high-throughput methylation profiling. We introduce two complementary approaches that use next-generation sequencing technology to detect cytosine methylation. In the first method, we designed approximately 10,000 bisulfite padlock probes to profile approximately 7,000 CpG locations distributed over the ENCODE pilot project regions and applied them to human B-lymphocytes, fibroblasts and induced pluripotent stem cells. This unbiased choice of targets takes advantage of existing expression and chromatin immunoprecipitation data and enabled us to observe a pattern of low promoter methylation and high gene-body methylation in highly expressed genes. The second method, methyl-sensitive cut counting, generated nontargeted genome-scale data for approximately 1.4 million HpaII sites in the DNA of B-lymphocytes and confirmed that gene-body methylation in highly expressed genes is a consistent phenomenon throughout the human genome. Our observations highlight the usefulness of techniques that are not inherently or intentionally biased towards particular subsets like CpG islands or promoter regions.
View details for DOI 10.1038/nbt.1533
View details for Web of Science ID 000264971800022
View details for PubMedID 19329998
Multiplex amplification of large sets of human exons
2007; 4 (11): 931-936
A new generation of technologies is poised to reduce DNA sequencing costs by several orders of magnitude. But our ability to fully leverage the power of these technologies is crippled by the absence of suitable 'front-end' methods for isolating complex subsets of a mammalian genome at a scale that matches the throughput at which these platforms will routinely operate. We show that targeting oligonucleotides released from programmable microarrays can be used to capture and amplify approximately 10,000 human exons in a single multiplex reaction. Additionally, we show integration of this protocol with ultra-high-throughput sequencing for targeted variation discovery. Although the multiplex capture reaction is highly specific, we found that nonuniform capture is a key issue that will need to be resolved by additional optimization. We anticipate that highly multiplexed methods for targeted amplification will enable the comprehensive resequencing of human exons at a fraction of the cost of whole-genome resequencing.
View details for DOI 10.1038/NMETH1110
View details for Web of Science ID 000250575700018
View details for PubMedID 17934468
Procom: a web-based tool to compare multiple eukaryotic proteomes
2005; 21 (8): 1693-1694
Each organism has traits that are shared with some, but not all, organisms. Identification of genes needed for a particular trait can be accomplished by a comparative genomics approach using three or more organisms. Genes that occur in organisms without the trait are removed from the set of genes in common among organisms with the trait. To facilitate these comparisons, a web-based server, Procom, was developed to identify the subset of genes that may be needed for a trait.The Procom program is freely available with documentation and examples at http://ural.wustl.edu/~billy/Procomfirstname.lastname@example.org.
View details for DOI 10.1093/bioinformatics/bti161
View details for Web of Science ID 000228401800058
View details for PubMedID 15564299
Analysis of Chlamydomonas reinhardtii genome structure using large-scale Sequencing of regions on linkage groups I and III
JOURNAL OF EUKARYOTIC MICROBIOLOGY
2003; 50 (3): 145-155
Chlamydomonas reinhardtii is a unicellular green alga that has been used as a model organism for the study of flagella and basal bodies as well as photosynthesis. This report analyzes finished genomic DNA sequence for 0.5% of the nuclear genome. We have used three gene prediction programs as well as EST and protein homology data to estimate the total number of genes in Chlamydomonas to be between 12,000 and 16,400. Chlamydomonas appears to have many more genes than any other unicellular organism sequenced to date. Twenty-seven percent of the predicted genes have significant identity to both ESTs and to known proteins in other organisms, 32% of the predicted genes have significant identity to ESTs alone, and 14% have significant similarity to known proteins in other organisms. For gene prediction in Chlamydomonas, GreenGenie appeared to have the highest sensitivity and specificity at the exon level, scoring 71% and 82%. respectively. Two new alternative splicing events were predicted by aligning Chlamydomonas ESTs to the genomic sequence. Finally recombination differs between the two sequenced contigs. The 350-Kb of the Linkage group III contig is devoid of recombination, while the Linkage group I contig is 30 map units long over 33-kb.
View details for Web of Science ID 000183473600001
View details for PubMedID 12836870