Honors & Awards
New Scholar Award, Ellison Medical Foundation (2012-2016)
Postdoctoral Fellow, Harvard Medical School, Genomics and Technology
Ph.D., Washington University in St. Louis, Genetics and Computational Biology (2005)
M.S., Tsinghua University, Molecular Biology (1999)
B.S., Tsinghua University, Biology (1997)
Current Research and Scholarly Interests
RNA editing: identification, regulation, and function
- Genetics and Developmental Biology Training Camp
GENE 200 (Aut)
- Next Generation Sequencing and Applications
BIOS 201 (Win)
Independent Studies (8)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum)
- Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum)
- Directed Reading in Genetics
GENE 299 (Aut, Win, Spr, Sum)
- Graduate Research
GENE 399 (Aut, Win, Spr, Sum)
- Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum)
- Medical Scholars Research
GENE 370 (Aut, Win, Spr, Sum)
- Supervised Study
GENE 260 (Aut, Win, Spr, Sum)
- Undergraduate Research
GENE 199 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
- Prior Year Courses
Quantifying RNA allelic ratios by microfluidic multiplex PCR and sequencing.
2014; 11 (1): 51-4
We developed a targeted RNA sequencing method that couples microfluidics-based multiplex PCR and deep sequencing (mmPCR-seq) to uniformly and simultaneously amplify up to 960 loci in 48 samples independently of their gene expression levels and to accurately and cost-effectively measure allelic ratios even for low-quantity or low-quality RNA samples. We applied mmPCR-seq to RNA editing and allele-specific expression studies. mmPCR-seq complements RNA-seq for studying allelic variations in the transcriptome.
View details for DOI 10.1038/nmeth.2736
View details for PubMedID 24270603
- RADAR: a rigorously annotated database of A-to-I RNA editing NUCLEIC ACIDS RESEARCH 2014; 42 (D1): D109-D113
Deciphering the functions and regulation of brain-enriched A-to-I RNA editing.
2013; 16 (11): 1518-1522
Adenosine-to-inosine (A-to-I) RNA editing, in which genomically encoded adenosine is changed to inosine in RNA, is catalyzed by adenosine deaminase acting on RNA (ADAR). This fine-tuning mechanism is critical during normal development and diseases, particularly in relation to brain functions. A-to-I RNA editing has also been hypothesized to be a driving force in human brain evolution. A large number of RNA editing sites have recently been identified, mostly as a result of the development of deep sequencing and bioinformatic analyses. Deciphering the functional consequences of RNA editing events is challenging, but emerging genome engineering approaches may expedite new discoveries. To understand how RNA editing is dynamically regulated, it is imperative to construct a spatiotemporal atlas at the species, tissue and cell levels. Future studies will need to identify the cis and trans regulatory factors that drive the selectivity and frequency of RNA editing. We anticipate that recent technological advancements will aid researchers in acquiring a much deeper understanding of the functions and regulation of RNA editing.
View details for DOI 10.1038/nn.3539
View details for PubMedID 24165678
Reliable Identification of Genomic Variants from RNA-Seq Data.
American journal of human genetics
2013; 93 (4): 641-651
Identifying genomic variation is a crucial step for unraveling the relationship between genotype and phenotype and can yield important insights into human diseases. Prevailing methods rely on cost-intensive whole-genome sequencing (WGS) or whole-exome sequencing (WES) approaches while the identification of genomic variants from often existing RNA sequencing (RNA-seq) data remains a challenge because of the intrinsic complexity in the transcriptome. Here, we present a highly accurate approach termed SNPiR to identify SNPs in RNA-seq data. We applied SNPiR to RNA-seq data of samples for which WGS and WES data are also available and achieved high specificity and sensitivity. Of the SNPs called from the RNA-seq data, >98% were also identified by WGS or WES. Over 70% of all expressed coding variants were identified from RNA-seq, and comparable numbers of exonic variants were identified in RNA-seq and WES. Despite our method's limitation in detecting variants in expressed regions only, our results demonstrate that SNPiR outperforms current state-of-the-art approaches for variant detection from RNA-seq data and offers a cost-effective and reliable alternative for SNP discovery.
View details for DOI 10.1016/j.ajhg.2013.08.008
View details for PubMedID 24075185
Identifying RNA editing sites using RNA sequencing data alone
2013; 10 (2): 128-132
We show that RNA editing sites can be called with high confidence using RNA sequencing data from multiple samples across either individuals or species, without the need for matched genomic DNA sequence. We identified many previously unidentified editing sites in both humans and Drosophila; our results nearly double the known number of human protein recoding events. We also found that human genes harboring conserved editing sites within Alu repeats are enriched for neuronal functions.
View details for DOI 10.1038/NMETH.2330
View details for Web of Science ID 000314623900018
View details for PubMedID 23291724
A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes.
RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine (A-to-I) RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most RNA editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon and its effect on transcriptome diversity is not yet clear. Here, we analyzed large-scale RNA-seq data and detected over 1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultra-deep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu repeats that form double-stranded RNA undergo A-to-I editing, although most sites exhibit editing at only low levels (<1%). Moreover, using high coverage sequencing, we observed editing of transcripts resulting from residual anti-sense expression, doubling the number of edited sites in the human genome. Based on bioinformatic analyses and deep targeted sequencing, we estimate that there are over 100 million human Alu RNA editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.
View details for DOI 10.1101/gr.164749.113
View details for PubMedID 24347612
- Lack of evidence for existence of noncanonical RNA editing NATURE BIOTECHNOLOGY 2013; 31 (1): 19-20
RNA sequencing reveals a diverse and dynamic repertoire of the Xenopus tropicalis transcriptome over development
2013; 23 (1): 201-216
The Xenopus embryo has provided key insights into fate specification, the cell cycle, and other fundamental developmental and cellular processes, yet a comprehensive understanding of its transcriptome is lacking. Here, we used paired end RNA sequencing (RNA-seq) to explore the transcriptome of Xenopus tropicalis in 23 distinct developmental stages. We determined expression levels of all genes annotated in RefSeq and Ensembl and showed for the first time on a genome-wide scale that, despite a general state of transcriptional silence in the earliest stages of development, approximately 150 genes are transcribed prior to the midblastula transition. In addition, our splicing analysis uncovered more than 10,000 novel splice junctions at each stage and revealed that many known genes have additional unannotated isoforms. Furthermore, we used Cufflinks to reconstruct transcripts from our RNA-seq data and found that ?13.5% of the final contigs are derived from novel transcribed regions, both within introns and in intergenic regions. We then developed a filtering pipeline to separate protein-coding transcripts from noncoding RNAs and identified a confident set of 6686 noncoding transcripts in 3859 genomic loci. Since the current reference genome, XenTro3, consists of hundreds of scaffolds instead of full chromosomes, we also performed de novo reconstruction of the transcriptome using Trinity and uncovered hundreds of transcripts that are missing from the genome. Collectively, our data will not only aid in completing the assembly of the Xenopus tropicalis genome but will also serve as a valuable resource for gene discovery and for unraveling the fundamental mechanisms of vertebrate embryogenesis.
View details for DOI 10.1101/gr.141424.112
View details for Web of Science ID 000312963400019
View details for PubMedID 22960373
Versatile design and synthesis platform for visualizing genomes with Oligopaint FISH probes
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2012; 109 (52): 21301-21306
A host of observations demonstrating the relationship between nuclear architecture and processes such as gene expression have led to a number of new technologies for interrogating chromosome positioning. Whereas some of these technologies reconstruct intermolecular interactions, others have enhanced our ability to visualize chromosomes in situ. Here, we describe an oligonucleotide- and PCR-based strategy for fluorescence in situ hybridization (FISH) and a bioinformatic platform that enables this technology to be extended to any organism whose genome has been sequenced. The oligonucleotide probes are renewable, highly efficient, and able to robustly label chromosomes in cell culture, fixed tissues, and metaphase spreads. Our method gives researchers precise control over the sequences they target and allows for single and multicolor imaging of regions ranging from tens of kilobases to megabases with the same basic protocol. We anticipate this technology will lead to an enhanced ability to visualize interphase and metaphase chromosomes.
View details for DOI 10.1073/pnas.1213818110
View details for Web of Science ID 000313627700041
View details for PubMedID 23236188
- The difficult calls in RNA editing NATURE BIOTECHNOLOGY 2012; 30 (12): 1207-1209
Activity-Dependent A-to-I RNA Editing in Rat Cortical Neurons
2012; 192 (1): 281-U569
Changes in neural activity influence synaptic plasticity/scaling, gene expression, and epigenetic modifications. We present the first evidence that short-term and persistent changes in neural activity can alter adenosine-to-inosine (A-to-I) RNA editing, a post-transcriptional site-specific modification found in several neuron-specific transcripts. In rat cortical neuron cultures, activity-dependent changes in A-to-I RNA editing in coding exons are present after 6 hr of high potassium depolarization but not after 1 hr and require calcium entry into neurons. When treatments are extended from hours to days, we observe a negative feedback phenomenon: Chronic depolarization increases editing at many sites and chronic silencing decreases editing. We present several different modulations of neural activity that change the expression of different mRNA isoforms through editing.
View details for DOI 10.1534/genetics.112.141200
View details for Web of Science ID 000309001800021
View details for PubMedID 22714409
A public resource facilitating clinical use of genomes
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2012; 109 (30): 11920-11927
Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved "open consent" process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain-we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.
View details for DOI 10.1073/pnas.1201904109
View details for Web of Science ID 000306992700018
View details for PubMedID 22797899
Accurate identification of human Alu and non-Alu RNA editing sites
2012; 9 (6): 579-?
We developed a computational framework to robustly identify RNA editing sites using transcriptome and genome deep-sequencing data from the same individual. As compared with previous methods, our approach identified a large number of Alu and non-Alu RNA editing sites with high specificity. We also found that editing of non-Alu sites appears to be dependent on nearby edited Alu sites, possibly through the locally formed double-stranded RNA structure.
View details for DOI 10.1038/NMETH.1982
View details for Web of Science ID 000304778500021
View details for PubMedID 22484847
Comment on "Widespread RNA and DNA Sequence Differences in the Human Transcriptome"
2012; 335 (6074)
Li et al. (Research Articles, 1 July 2011, p. 53; published online 19 May 2011) reported widespread differences between the RNA and DNA sequences of the same human cells, including all 12 possible mismatch types. Before accepting such a fundamental claim, a deeper analysis of the sequencing data is required to discern true differences between RNA and DNA from potential artifacts.
View details for DOI 10.1126/science.1210624
View details for Web of Science ID 000301531600026
View details for PubMedID 22422964
Comparative RNA editing in autistic and neurotypical cerebella.
Adenosine-to-inosine (A-to-I) RNA editing is a neurodevelopmentally regulated epigenetic modification shown to modulate complex behavior in animals. Little is known about human A-to-I editing, but it is thought to constitute one of many molecular mechanisms connecting environmental stimuli and behavioral outputs. Thus, comprehensive exploration of A-to-I RNA editing in human brains may shed light on gene-environment interactions underlying complex behavior in health and disease. Synaptic function is a main target of A-to-I editing, which can selectively recode key amino acids in synaptic genes, directly altering synaptic strength and duration in response to environmental signals. Here, we performed a high-resolution survey of synaptic A-to-I RNA editing in a human population, and examined how it varies in autism, a neurodevelopmental disorder in which synaptic abnormalities are a common finding. Using ultra-deep (>1000 × ) sequencing, we quantified the levels of A-to-I editing of 10 synaptic genes in postmortem cerebella from 14 neurotypical and 11 autistic individuals. A high dynamic range of editing levels was detected across individuals and editing sites, from 99.6% to below detection limits. In most sites, the extreme ends of the population editing distributions were individuals with autism. Editing was correlated with isoform usage, clusters of correlated sites were identified, and differential editing patterns examined. Finally, a dysfunctional form of the editing enzyme adenosine deaminase acting on RNA B1 was found more commonly in postmortem cerebella from individuals with autism. These results provide a population-level, high-resolution view of A-to-I RNA editing in human cerebella and suggest that A-to-I editing of synaptic genes may be informative for assessing the epigenetic risk for autism.Molecular Psychiatry advance online publication, 7 August 2012; doi:10.1038/mp.2012.118.
View details for PubMedID 22869036
Scalable gene synthesis by selective amplification of DNA pools from high-fidelity microchips
2010; 28 (12): 1295-U108
Development of cheap, high-throughput and reliable gene synthesis methods will broadly stimulate progress in biology and biotechnology. Currently, the reliance on column-synthesized oligonucleotides as a source of DNA limits further cost reductions in gene synthesis. Oligonucleotides from DNA microchips can reduce costs by at least an order of magnitude, yet efforts to scale their use have been largely unsuccessful owing to the high error rates and complexity of the oligonucleotide mixtures. Here we use high-fidelity DNA microchips, selective oligonucleotide pool amplification, optimized gene assembly protocols and enzymatic error correction to develop a method for highly parallel gene synthesis. We tested our approach by assembling 47 genes, including 42 challenging therapeutic antibody sequences, encoding a total of ?35 kilobase pairs of DNA. These assemblies were performed from a complex background containing 13,000 oligonucleotides encoding ?2.5 megabases of DNA, which is at least 50 times larger than in previously published attempts.
View details for DOI 10.1038/nbt.1716
View details for Web of Science ID 000285088400024
View details for PubMedID 21113165
Sequence based identification of RNA editing sites
2010; 7 (2): 248-252
RNA editing diversifies the human transcriptome beyond the genomic repertoire. Recent years have proven a strategy based on genomics and computational sequence analysis as a powerful tool for identification and characterization of RNA editing. In particular, analysis of the human transcriptome has resulted in the identification of thousands of A-to-I editing sites within genomic repeats, as well as a few hundred sites located outside repeats. We review these recent advancements, emphasizing the principles underlying the various methods used. Possible directions for extending these methods are discussed.
View details for Web of Science ID 000282761300020
View details for PubMedID 20215866
A Robust Approach to Identifying Tissue-Specific Gene Expression Regulatory Variants Using Personalized Human Induced Pluripotent Stem Cells
2009; 5 (11)
Normal variation in gene expression due to regulatory polymorphisms is often masked by biological and experimental noise. In addition, some regulatory polymorphisms may become apparent only in specific tissues. We derived human induced pluripotent stem (iPS) cells from adult skin primary fibroblasts and attempted to detect tissue-specific cis-regulatory variants using in vitro cell differentiation. We used padlock probes and high-throughput sequencing for digital RNA allelotyping and measured allele-specific gene expression in primary fibroblasts, lymphoblastoid cells, iPS cells, and their differentiated derivatives. We show that allele-specific expression is both cell type and genotype-dependent, but the majority of detectable allele-specific expression loci remains consistent despite large changes in the cell type or the experimental condition following iPS reprogramming, except on the X-chromosome. We show that our approach to mapping cis-regulatory variants reduces in vitro experimental noise and reveals additional tissue-specific variants using skin-derived human iPS cells.
View details for DOI 10.1371/journal.pgen.1000718
View details for Web of Science ID 000272419500014
View details for PubMedID 19911041
Multiplex padlock targeted sequencing reveals human hypermutable CpG variations
2009; 19 (9): 1606-1615
Utilizing the full power of next-generation sequencing often requires the ability to perform large-scale multiplex enrichment of many specific genomic loci in multiple samples. Several technologies have been recently developed but await substantial improvements. We report the 10,000-fold improvement of a previously developed padlock-based approach, and apply the assay to identifying genetic variations in hypermutable CpG regions across human chromosome 21. From approximately 3 million reads derived from a single Illumina Genome Analyzer lane, approximately 94% (approximately 50,500) target sites can be observed with at least one read. The uniformity of coverage was also greatly improved; up to 93% and 57% of all targets fell within a 100- and 10-fold coverage range, respectively. Alleles at >400,000 target base positions were determined across six subjects and examined for single nucleotide polymorphisms (SNPs), and the concordance with independently obtained genotypes was 98.4%-100%. We detected >500 SNPs not currently in dbSNP, 362 of which were in targeted CpG locations. Transitions in CpG sites were at least 13.7 times more abundant than non-CpG transitions. Fractions of polymorphic CpG sites are lower in CpG-rich regions and show higher correlation with human-chimpanzee divergence within CpG versus non-CpG sites. This is consistent with the hypothesis that methylation rate heterogeneity along chromosomes contributes to mutation rate variation in humans. Our success suggests that targeted CpG resequencing is an efficient way to identify common and rare genetic variations. In addition, the significantly improved padlock capture technology can be readily applied to other projects that require multiplex sample preparation.
View details for DOI 10.1101/gr.092213.109
View details for Web of Science ID 000269482200011
View details for PubMedID 19525355
Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human
2009; 6 (8): 613-U90
We developed a digital RNA allelotyping method for quantitatively interrogating allele-specific gene expression. This method involves ultra-deep sequencing of padlock-captured single-nucleotide polymorphisms (SNPs) from the transcriptome. We characterized four cell lines established from two human subjects in the Personal Genome Project. Approximately 11-22% of the heterozygous mRNA-associated SNPs showed allele-specific expression in each cell line and 4.3-8.5% were tissue-specific, suggesting the presence of tissue-specific cis regulation. When we applied allelotyping to two pairs of sibling human embryonic stem cell lines, the sibling lines were more similar in allele-specific expression than were the genetically unrelated lines. We found that the variation of allelic ratios in gene expression among different cell lines was primarily explained by genetic variations, much more so than by specific tissue types or growth conditions. Comparison of expressed SNPs on the sense and antisense transcripts suggested that allelic ratios are primarily determined by cis-regulatory mechanisms on the sense transcripts.
View details for DOI 10.1038/nmeth.1357
View details for Web of Science ID 000268493700024
View details for PubMedID 19620972
Genome-Wide Identification of Human RNA Editing Sites by Parallel DNA Capturing and Sequencing
2009; 324 (5931): 1210-1213
Adenosine-to-inosine (A-to-I) RNA editing leads to transcriptome diversity and is important for normal brain function. To date, only a handful of functional sites have been identified in mammals. We developed an unbiased assay to screen more than 36,000 computationally predicted nonrepetitive A-to-I sites using massively parallel target capture and DNA sequencing. A comprehensive set of several hundred human RNA editing sites was detected by comparing genomic DNA with RNAs from seven tissues of a single individual. Specificity of our profiling was supported by observations of enrichment with known features of targets of adenosine deaminases acting on RNA (ADAR) and validation by means of capillary sequencing. This efficient approach greatly expands the repertoire of RNA editing targets and can be applied to studies involving RNA editing-related human diseases.
View details for DOI 10.1126/science.1170995
View details for Web of Science ID 000266410100049
View details for PubMedID 19478186
Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells
2009; 27 (4): 361-368
Studies of epigenetic modifications would benefit from improved methods for high-throughput methylation profiling. We introduce two complementary approaches that use next-generation sequencing technology to detect cytosine methylation. In the first method, we designed approximately 10,000 bisulfite padlock probes to profile approximately 7,000 CpG locations distributed over the ENCODE pilot project regions and applied them to human B-lymphocytes, fibroblasts and induced pluripotent stem cells. This unbiased choice of targets takes advantage of existing expression and chromatin immunoprecipitation data and enabled us to observe a pattern of low promoter methylation and high gene-body methylation in highly expressed genes. The second method, methyl-sensitive cut counting, generated nontargeted genome-scale data for approximately 1.4 million HpaII sites in the DNA of B-lymphocytes and confirmed that gene-body methylation in highly expressed genes is a consistent phenomenon throughout the human genome. Our observations highlight the usefulness of techniques that are not inherently or intentionally biased towards particular subsets like CpG islands or promoter regions.
View details for DOI 10.1038/nbt.1533
View details for Web of Science ID 000264971800022
View details for PubMedID 19329998
Multiplex amplification of large sets of human exons
2007; 4 (11): 931-936
A new generation of technologies is poised to reduce DNA sequencing costs by several orders of magnitude. But our ability to fully leverage the power of these technologies is crippled by the absence of suitable 'front-end' methods for isolating complex subsets of a mammalian genome at a scale that matches the throughput at which these platforms will routinely operate. We show that targeting oligonucleotides released from programmable microarrays can be used to capture and amplify approximately 10,000 human exons in a single multiplex reaction. Additionally, we show integration of this protocol with ultra-high-throughput sequencing for targeted variation discovery. Although the multiplex capture reaction is highly specific, we found that nonuniform capture is a key issue that will need to be resolved by additional optimization. We anticipate that highly multiplexed methods for targeted amplification will enable the comprehensive resequencing of human exons at a fraction of the cost of whole-genome resequencing.
View details for DOI 10.1038/NMETH1110
View details for Web of Science ID 000250575700018
View details for PubMedID 17934468
Procom: a web-based tool to compare multiple eukaryotic proteomes
2005; 21 (8): 1693-1694
Each organism has traits that are shared with some, but not all, organisms. Identification of genes needed for a particular trait can be accomplished by a comparative genomics approach using three or more organisms. Genes that occur in organisms without the trait are removed from the set of genes in common among organisms with the trait. To facilitate these comparisons, a web-based server, Procom, was developed to identify the subset of genes that may be needed for a trait.The Procom program is freely available with documentation and examples at http://ural.wustl.edu/~billy/Procomfirstname.lastname@example.org.
View details for DOI 10.1093/bioinformatics/bti161
View details for Web of Science ID 000228401800058
View details for PubMedID 15564299
Comparative and basal genomics identifies a flagellar and basal body proteome that includes the BBS5 human disease gene
2004; 117 (4): 541-552
Cilia and flagella are microtubule-based structures nucleated by modified centrioles termed basal bodies. These biochemically complex organelles have more than 250 and 150 polypeptides, respectively. To identify the proteins involved in ciliary and basal body biogenesis and function, we undertook a comparative genomics approach that subtracted the nonflagellated proteome of Arabidopsis from the shared proteome of the ciliated/flagellated organisms Chlamydomonas and human. We identified 688 genes that are present exclusively in organisms with flagella and basal bodies and validated these data through a series of in silico, in vitro, and in vivo studies. We then applied this resource to the study of human ciliation disorders and have identified BBS5, a novel gene for Bardet-Biedl syndrome. We show that this novel protein localizes to basal bodies in mouse and C. elegans, is under the regulatory control of daf-19, and is necessary for the generation of both cilia and flagella.
View details for Web of Science ID 000221458000013
View details for PubMedID 15137946
Analysis of Chlamydomonas reinhardtii genome structure using large-scale Sequencing of regions on linkage groups I and III
JOURNAL OF EUKARYOTIC MICROBIOLOGY
2003; 50 (3): 145-155
Chlamydomonas reinhardtii is a unicellular green alga that has been used as a model organism for the study of flagella and basal bodies as well as photosynthesis. This report analyzes finished genomic DNA sequence for 0.5% of the nuclear genome. We have used three gene prediction programs as well as EST and protein homology data to estimate the total number of genes in Chlamydomonas to be between 12,000 and 16,400. Chlamydomonas appears to have many more genes than any other unicellular organism sequenced to date. Twenty-seven percent of the predicted genes have significant identity to both ESTs and to known proteins in other organisms, 32% of the predicted genes have significant identity to ESTs alone, and 14% have significant similarity to known proteins in other organisms. For gene prediction in Chlamydomonas, GreenGenie appeared to have the highest sensitivity and specificity at the exon level, scoring 71% and 82%. respectively. Two new alternative splicing events were predicted by aligning Chlamydomonas ESTs to the genomic sequence. Finally recombination differs between the two sequenced contigs. The 350-Kb of the Linkage group III contig is devoid of recombination, while the Linkage group I contig is 30 map units long over 33-kb.
View details for Web of Science ID 000183473600001
View details for PubMedID 12836870