Marc Salit leads the Genome-Scale Measurements Group at the US National Institute of Standards and Technology. This group develops standards and methods for 21st Century bioscience, with emphasis on making the tools so innovation can translate to commercial and clinical practice. He leads efforts in the Genome in a Bottle Consortium, the External RNA Controls Consortium, and the Synthetic Biology Standards Consortium to make sure the community guides efforts to make the right standards in a .
Marc is a founder of the Joint Initiative for Metrology in Biology, working closely with Stanford University colleagues to build a new scientific institution devoted to biometrology -- the study of measurement science in biology.
Development and Characterization of Reference Materials for Genetic Testing: Focus on Public Partnerships
ANNALS OF LABORATORY MEDICINE
2016; 36 (6): 513-520
Characterized reference materials (RMs) are needed for clinical laboratory test development and validation, quality control procedures, and proficiency testing to assure their quality. In this article, we review the development and characterization of RMs for clinical molecular genetic tests. We describe various types of RMs and how to access and utilize them, especially focusing on the Genetic Testing Reference Materials Coordination Program (Get-RM) and the Genome in a Bottle (GIAB) Consortium. This review also reinforces the need for collaborative efforts in the clinical genetic testing community to develop additional RMs.
View details for DOI 10.3343/alm.2016.36.6.513
View details for Web of Science ID 000383248900002
View details for PubMedID 27578503
When Wavelengths Collide: Bias in Cell Abundance Measurements Due to Expressed Fluorescent Proteins
ACS SYNTHETIC BIOLOGY
2016; 5 (9): 1024-1027
The abundance of bacteria in liquid culture is commonly inferred by measuring optical density at 600 nm. Red fluorescent proteins (RFPs) can strongly absorb light at 600 nm. Increasing RFP expression can falsely inflate apparent cell density and lead to underestimations of mean per-cell fluorescence by up to 10%. Measuring optical density at 700 nm would allow estimation of cell abundance unaffected by the presence of nearly all fluorescent proteins.
View details for DOI 10.1021/acssynbio.6b00072
View details for Web of Science ID 000383641400015
View details for PubMedID 27187075
In Vivo Site-Specific Protein Tagging with Diverse Amines Using an Engineered Sortase Variant
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY
2016; 138 (24): 7496-7499
Chemoenzymatic modification of proteins is an attractive option to create highly specific conjugates for therapeutics, diagnostics, or materials under gentle biological conditions. However, these methods often suffer from expensive specialized substrates, bulky fusion tags, low yields, and extra purification steps to achieve the desired conjugate. Staphylococcus aureus sortase A and its engineered variants are used to attach oligoglycine derivatives to the C-terminus of proteins expressed with a minimal LPXTG tag. This strategy has been used extensively for bioconjugation in vitro and for protein-protein conjugation in living cells. Here we show that an enzyme variant recently engineered for higher activity on oligoglycine has promiscuous activity that allows proteins to be tagged using a diverse array of small, commercially available amines, including several bioorthogonal functional groups. This technique can also be carried out in living Escherichia coli, enabling simple, inexpensive production of chemically functionalized proteins with no additional purification steps.
View details for DOI 10.1021/jacs.6b03836
View details for Web of Science ID 000378584600013
View details for PubMedID 27280683
A research roadmap for next-generation sequencing informatics
SCIENCE TRANSLATIONAL MEDICINE
2016; 8 (335)
Next-generation sequencing technologies are fueling a wave of new diagnostic tests. Progress on a key set of nine research challenge areas will help generate the knowledge required to advance effectively these diagnostics to the clinic.
View details for DOI 10.1126/scitranslmed.aaf7314
View details for PubMedID 27099173
PEPR: pipelines for evaluating prokaryotic references
ANALYTICAL AND BIOANALYTICAL CHEMISTRY
2016; 408 (11): 2975-2983
The rapid adoption of microbial whole genome sequencing in public health, clinical testing, and forensic laboratories requires the use of validated measurement processes. Well-characterized, homogeneous, and stable microbial genomic reference materials can be used to evaluate measurement processes, improving confidence in microbial whole genome sequencing results. We have developed a reproducible and transparent bioinformatics tool, PEPR, Pipelines for Evaluating Prokaryotic References, for characterizing the reference genome of prokaryotic genomic materials. PEPR evaluates the quality, purity, and homogeneity of the reference material genome, and purity of the genomic material. The quality of the genome is evaluated using high coverage paired-end sequence data; coverage, paired-end read size and direction, as well as soft-clipping rates, are used to identify mis-assemblies. The homogeneity and purity of the material relative to the reference genome are characterized by comparing base calls from replicate datasets generated using multiple sequencing technologies. Genomic purity of the material is assessed by checking for DNA contaminants. We demonstrate the tool and its output using sequencing data while developing a Staphylococcus aureus candidate genomic reference material. PEPR is open source and available at https://github.com/usnistgov/pepr .
View details for DOI 10.1007/s00216-015-9299-5
View details for Web of Science ID 000374110700028
View details for PubMedID 26935931
Evaluation of microbial qPCR workflows using engineered Saccharomyces cerevisiae.
Biomolecular detection and quantification
2016; 7: 27-33
We describe the development and interlaboratory study of modified Saccharomyces cerevisiae as a candidate material to evaluate a full detection workflow including DNA extraction and quantitative polymerase chain reaction (qPCR).S. cerevisiae NE095 was prepared by stable insertion of DNA sequence External RNA Control Consortium-00095 into S. cerevisiae BY4739 to convey selectivity. For the interlaboratory study, a binomial regression model was used to select three cell concentrations, high (4 × 10(7) cells ml(-1)), intermediate (4 × 10(5) cells ml(-1)) and low (4 × 10(3) cells ml(-1)), and the number of samples per concentration. Seven participants, including potential end users, had combined rates of positive qPCR detection (quantification cycle <37) of 100%, 40%, and 0% for high, intermediate, and low concentrations, respectively.The NE095 strain was successfully detected by all participants, with the high concentration indicating a potential target concentration for a reference material.The engineered yeast has potential to support measurement assurance for the analytical process of qPCR, encompassing the method, equipment, and operator, to increase confidence in results and better inform decision-making in areas of applied microbiology. This material can also support process assessment for other DNA-based detection technologies.
View details for DOI 10.1016/j.bdq.2016.01.001
View details for PubMedID 27077050
- svclassify: a method to establish benchmark structural variant calls BMC GENOMICS 2016; 17
Medical implications of technical accuracy in genome sequencing.
2016; 8 (1): 24-?
As whole exome sequencing (WES) and whole genome sequencing (WGS) transition from research tools to clinical diagnostic tests, it is increasingly critical for sequencing methods and analysis pipelines to be technically accurate. The Genome in a Bottle Consortium has recently published a set of benchmark SNV, indel, and homozygous reference genotypes for the pilot whole genome NIST Reference Material based on the NA12878 genome.We examine the relationship between human genome complexity and genes/variants reported to be associated with human disease. Specifically, we map regions of medical relevance to benchmark regions of high or low confidence. We use benchmark data to assess the sensitivity and positive predictive value of two representative sequencing pipelines for specific classes of variation.We observe that the accuracy of a variant call depends on the genomic region, variant type, and read depth, and varies by analytical pipeline. We find that most false negative WGS calls result from filtering while most false negative WES variants relate to poor coverage. We find that only 74.6% of the exonic bases in ClinVar and OMIM genes and 82.1% of the exonic bases in ACMG-reportable genes are found in high-confidence regions. Only 990 genes in the genome are found entirely within high-confidence regions while 593 of 3,300 ClinVar/OMIM genes have less than 50% of their total exonic base pairs in high-confidence regions. We find greater than 77 % of the pathogenic or likely pathogenic SNVs currently in ClinVar fall within high-confidence regions. We identify sites that are prone to sequencing errors, including thousands present in publicly available variant databases. Finally, we examine the clinical impact of mandatory reporting of secondary findings, highlighting a false positive variant found in BRCA2.Together, these data illustrate the importance of appropriate use and continued improvement of technical benchmarks to ensure accurate and judicious interpretation of next-generation DNA sequencing results in the clinical setting.
View details for DOI 10.1186/s13073-016-0269-0
View details for PubMedID 26932475
View details for PubMedCentralID PMC4774017
Extensive sequencing of seven human genomes to characterize benchmark reference materials.
2016; 3: 160025-?
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
View details for DOI 10.1038/sdata.2016.25
View details for PubMedID 27271295
View details for PubMedCentralID PMC4896128
Toward achieving harmonization in a nano-cytotoxicity assay measurement through an interlaboratory comparison study.
Design and development of reliable cell-based nanotoxicology assays are important for evaluation of potentially hazardous engineered nanomaterials. Challenges to producing a reliable assay protocol include working with nanoparticle dispersions and living cell lines, and the potential for nano-related interference effects. Here we demonstrate the use of a 96-well plate design with several measurement controls and an interlaboratory comparison study involving five laboratories to characterize the robustness of a nano-cytotoxicity MTS cell viability assay. The consensus EC50 values were 22.1 mg/l (95 % confidence intervals 16.9 mg/l to 27.2 mg/l) and 52.6 mg/l (44.1 mg/l to 62.6 mg/l) for the A549 cell line from ATCC for positively charged polystyrene nanoparticles for the serum free and serum conditions, respectively, and were 49.7 μmol/l (47.5 μmol/l to 51.5 μmol/l) and 77.0 μmol/l (54.3 μmol/l to 99.4 μmol/l) for positive chemical control cadmium sulfate for the serum free and serum conditions, respectively. Results from the measurement controls can be used to evaluate the sources of variability and their relative magnitudes within and between laboratories. This information revealed steps of the protocol that may need to be modified to improve the overall robustness and precision. The results suggest that protocol details such as cell line ID, media exchange, cell handling, and nanoparticle dispersion are critical to ensure protocol robustness and comparability of nano-cytotoxicity assay results. The combination of system control measurements and interlaboratory comparison data yielded insights that would not have been available by either approach by itself.
View details for PubMedID 27684074
Minimum information for reporting next generation sequence genotyping (MIRING): Guidelines for reporting HLA and KIR genotyping via next generation sequencing
2015; 76 (12): 954-962
The development of next-generation sequencing (NGS) technologies for HLA and KIR genotyping is rapidly advancing knowledge of genetic variation of these highly polymorphic loci. NGS genotyping is poised to replace older methods for clinical use, but standard methods for reporting and exchanging these new, high quality genotype data are needed. The Immunogenomic NGS Consortium, a broad collaboration of histocompatibility and immunogenetics clinicians, researchers, instrument manufacturers and software developers, has developed the Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines. MIRING is a checklist that specifies the content of NGS genotyping results as well as a set of messaging guidelines for reporting the results. A MIRING message includes five categories of structured information - message annotation, reference context, full genotype, consensus sequence and novel polymorphism - and references to three categories of accessory information - NGS platform documentation, read processing documentation and primary data. These eight categories of information ensure the long-term portability and broad application of this NGS data for all current histocompatibility and immunogenetics use cases. In addition, MIRING can be extended to allow the reporting of genotype data generated using pre-NGS technologies. Because genotyping results reported using MIRING are easily updated in accordance with reference and nomenclature databases, MIRING represents a bold departure from previous methods of reporting HLA and KIR genotyping results, which have provided static and less-portable data. More information about MIRING can be found online at miring.immunogenomics.org.
View details for DOI 10.1016/j.humimm.2015.09.011
View details for Web of Science ID 000366437900012
View details for PubMedID 26407912
View details for PubMedCentralID PMC4674382
Unmet needs: Research helps regulators do their jobs
SCIENCE TRANSLATIONAL MEDICINE
2015; 7 (315)
A plethora of innovative new medical products along with the need to apply modern technologies to medical-product evaluation has spurred seminal opportunities in regulatory sciences. Here, we provide eight examples of regulatory science research for diverse products. Opportunities abound, particularly in data science and precision health.
View details for DOI 10.1126/scitranslmed.aac4369
View details for Web of Science ID 000366135900002
View details for PubMedID 26606966
- Advancing Benchmarks for Genome Sequencing. Cell systems 2015; 1 (3): 176-177
Using mixtures of biological samples as process controls for RNA-sequencing experiments
Genome-scale "-omics" measurements are challenging to benchmark due to the enormous variety of unique biological molecules involved. Mixtures of previously-characterized samples can be used to benchmark repeatability and reproducibility using component proportions as truth for the measurement. We describe and evaluate experiments characterizing the performance of RNA-sequencing (RNA-Seq) measurements, and discuss cases where mixtures can serve as effective process controls.We apply a linear model to total RNA mixture samples in RNA-seq experiments. This model provides a context for performance benchmarking. The parameters of the model fit to experimental results can be evaluated to assess bias and variability of the measurement of a mixture. A linear model describes the behavior of mixture expression measures and provides a context for performance benchmarking. Residuals from fitting the model to experimental data can be used as a metric for evaluating the effect that an individual step in an experimental process has on the linear response function and precision of the underlying measurement while identifying signals affected by interference from other sources. Effective benchmarking requires well-defined mixtures, which for RNA-Seq requires knowledge of the post-enrichment 'target RNA' content of the individual total RNA components. We demonstrate and evaluate an experimental method suitable for use in genome-scale process control and lay out a method utilizing spike-in controls to determine enriched RNA content of total RNA in samples.Genome-scale process controls can be derived from mixtures. These controls relate prior knowledge of individual components to a complex mixture, allowing assessment of measurement performance. The target RNA fraction accounts for differential selection of RNA out of variable total RNA samples. Spike-in controls can be utilized to measure this relationship between target RNA content and input total RNA. Our mixture analysis method also enables estimation of the proportions of an unknown mixture, even when component-specific markers are not previously known, whenever pure components are measured alongside the mixture.
View details for DOI 10.1186/s12864-015-1912-7
View details for Web of Science ID 000361353400004
View details for PubMedID 26383878
Best practices for evaluating single nucleotide variant calling methods for microbial genomics
FRONTIERS IN GENETICS
Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit's focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards.
View details for DOI 10.3389/fgene.2015.00235
View details for Web of Science ID 000359651200001
View details for PubMedID 26217378
- Good laboratory practice for clinical next-generation sequencing informatics pipelines NATURE BIOTECHNOLOGY 2015; 33 (7): 689-693
Use of Cause-and-Effect Analysis to Design a High-Quality Nanocytotoxicology Assay.
Chemical research in toxicology
2015; 28 (1): 21-30
An important consideration in developing standards and regulations that govern the production and use of commercial nanoscale materials is the development of robust and reliable measurements to monitor the potential adverse biological effects of such products. These measurements typically require cell-based and other biological assays that provide an assessment of the risks associated with the nanomaterial of interest. In this perspective, we describe the use of cause-and-effect (C&E) analysis to design robust, high quality cell-based assays to test nanoparticle-related cytotoxicity. C&E analysis of an assay system identifies the sources of variability that influence the test result. These sources can then be used to design control experiments that aid in establishing the validity of a test result. We demonstrate the application of C&E analysis to the commonly used 3-(4,5-dimethylthiazol-2-yl)-5-(3-carboxymethoxyphenyl)-2-(4-sulfophenyl)-2H-tetrazolium (MTS) cell-viability assay. This is the first time to our knowledge that C&E analysis has been used to characterize a cell-based toxicity assay. We propose the use of a 96-well plate layout which incorporates a range of control experiments to assess multiple factors such as nanomaterial interference, pipetting accuracy, cell seeding density, and instrument performance, and demonstrate the performance of the assay using the plate layout in a case study. While the plate layout was formulated specifically for the MTS assay, it is applicable to other cytotoxicity, ecotoxicity (i.e., bacteria toxicity), and nanotoxicity assays after assay-specific modifications.
View details for DOI 10.1021/tx500327y
View details for PubMedID 25473822
Achieving high-sensitivity for clinical applications using augmented exome sequencing.
2015; 7 (1): 71-?
Whole exome sequencing is increasingly used for the clinical evaluation of genetic disease, yet the variation of coverage and sensitivity over medically relevant parts of the genome remains poorly understood. Several sequencing-based assays continue to provide coverage that is inadequate for clinical assessment.Using sequence data obtained from the NA12878 reference sample and pre-defined lists of medically-relevant protein-coding and noncoding sequences, we compared the breadth and depth of coverage obtained among four commercial exome capture platforms and whole genome sequencing. In addition, we evaluated the performance of an augmented exome strategy, ACE, that extends coverage in medically relevant regions and enhances coverage in areas that are challenging to sequence. Leveraging reference call-sets, we also examined the effects of improved coverage on variant detection sensitivity.We observed coverage shortfalls with each of the conventional exome-capture and whole-genome platforms across several medically interpretable genes. These gaps included areas of the genome required for reporting recently established secondary findings (ACMG) and known disease-associated loci. The augmented exome strategy recovered many of these gaps, resulting in improved coverage in these areas. At clinically-relevant coverage levels (100 % bases covered at ≥20×), ACE improved coverage among genes in the medically interpretable genome (>90 % covered relative to 10-78 % with other platforms), the set of ACMG secondary finding genes (91 % covered relative to 4-75 % with other platforms) and a subset of variants known to be associated with human disease (99 % covered relative to 52-95 % with other platforms). Improved coverage translated into improvements in sensitivity, with ACE variant detection sensitivities (>97.5 % SNVs, >92.5 % InDels) exceeding that observed with conventional whole-exome and whole-genome platforms.Clinicians should consider analytical performance when making clinical assessments, given that even a few missed variants can lead to reporting false negative results. An augmented exome strategy provides a level of coverage not achievable with other platforms, thus addressing concerns regarding the lack of sensitivity in clinically important regions. In clinical applications where comprehensive coverage of medically interpretable areas of the genome requires higher localized sequencing depth, an augmented exome approach offers both cost and performance advantages over other sequencing-based tests.
View details for DOI 10.1186/s13073-015-0197-4
View details for PubMedID 26269718
Ontology analysis of global gene expression differences of human bone marrow stromal cells cultured on 3D scaffolds or 2D films
2014; 35 (25): 6716-6726
Differences in gene expression of human bone marrow stromal cells (hBMSCs) during culture in three-dimensional (3D) nanofiber scaffolds or on two-dimensional (2D) films were investigated via pathway analysis of microarray mRNA expression profiles. Previous work has shown that hBMSC culture in nanofiber scaffolds can induce osteogenic differentiation in the absence of osteogenic supplements (OS). Analysis using ontology databases revealed that nanofibers and OS regulated similar pathways and that both were enriched for TGF-β and cell-adhesion/ECM-receptor pathways. The most notable difference between the two was that nanofibers had stronger enrichment for cell-adhesion/ECM-receptor pathways. Comparison of nanofibers scaffolds with flat films yielded stronger differences in gene expression than comparison of nanofibers made from different polymers, suggesting that substrate structure had stronger effects on cell function than substrate polymer composition. These results demonstrate that physical (nanofibers) and biochemical (OS) signals regulate similar ontological pathways, suggesting that these cues use similar molecular mechanisms to control hBMSC differentiation.
View details for DOI 10.1016/j.biomaterials.2014.04.075
View details for Web of Science ID 000338386800006
View details for PubMedID 24840613
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls
2014; 32 (3): 246-251
Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method.
View details for DOI 10.1038/nbt.2835
View details for Web of Science ID 000332819800022
View details for PubMedID 24531798
Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures.
2014; 5: 5125-?
There is a critical need for standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments. Here we assess technical performance with a proposed standard 'dashboard' of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagnostic performance of differentially expressed transcript lists, limit of detection of ratio (LODR) estimates and expression ratio variability and measurement bias. The performance metrics suite is applicable to analysis of a typical experiment, and here we also apply these metrics to evaluate technical performance among laboratories. An interlaboratory study using identical samples shared among 12 laboratories with three different measurement processes demonstrates generally consistent diagnostic power across 11 laboratories. Ratio measurement variability and bias are also comparable among laboratories for the same measurement process. We observe different biases for measurement processes using different mRNA-enrichment protocols.
View details for DOI 10.1038/ncomms6125
View details for PubMedID 25254650
Characterization of in vitro transcription amplification linearity and variability in the low copy number regime using External RNA Control Consortium (ERCC) spike-ins
ANALYTICAL AND BIOANALYTICAL CHEMISTRY
2013; 405 (1): 315-320
Using spike-in controls designed to mimic mammalian mRNA species, we used the quantitative reverse transcription polymerase chain reaction (RT-qPCR) to assess the performance of in vitro transcription (IVT) amplification process of small samples. We focused especially on the confidence of the transcript level measurement, which is essential for differential gene expression analyses. IVT reproduced gene expression profiles down to approximately 100 absolute input copies. However, a RT-qPCR analysis of the antisense RNA showed a systematic bias against low copy number transcripts, regardless of sequence. Experiments also showed that noise increases with decreasing copy number. First-round IVT preserved the gene expression information within a sample down to the 100 copy level, regardless of total input sample amount. However, the amplification was nonlinear under low total RNA input/long IVT conditions. Variability of the amplification increased predictably with decreasing input copy number. For the small enrichments of interest in typical differential gene expression studies (e.g., twofold changes), the bias from IVT reactions is unlikely to affect the results. In limited cases, some transcript-specific differential gene expression values will need adjustment to reflect this bias. Proper experimental design with reasonable detection limits will yield differential gene expression capability even between low copy number transcripts.
View details for DOI 10.1007/s00216-012-6445-1
View details for Web of Science ID 000313064000031
View details for PubMedID 23086083
New and improved proteomics technologies for understanding complex biological systems: Addressing a grand challenge in the life sciences
2012; 12 (18): 2773-2783
This White Paper sets out a Life Sciences Grand Challenge for Proteomics Technologies to enhance our understanding of complex biological systems, link genomes with phenotypes, and bring broad benefits to the biosciences and the US economy. The paper is based on a workshop hosted by the National Institute of Standards and Technology (NIST) in Gaithersburg, MD, 14-15 February 2011, with participants from many federal R&D agencies and research communities, under the aegis of the US National Science and Technology Council (NSTC). Opportunities are identified for a coordinated R&D effort to achieve major technology-based goals and address societal challenges in health, agriculture, nutrition, energy, environment, national security, and economic development.
View details for DOI 10.1002/pmic.201270086
View details for Web of Science ID 000308644300002
View details for PubMedID 22807061
Synthetic Spike-in Standards Improve Run-Specific Systematic Error Analysis for DNA and RNA Sequencing
2012; 7 (7)
While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being "recalibrated" (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration.
View details for DOI 10.1371/journal.pone.0041356
View details for Web of Science ID 000307045600022
View details for PubMedID 22859977
Mediation of Drosophila autosomal dosage effects and compensation by network interactions
2012; 13 (4)
Gene dosage change is a mild perturbation that is a valuable tool for pathway reconstruction in Drosophila. While it is often assumed that reducing gene dose by half leads to two-fold less expression, there is partial autosomal dosage compensation in Drosophila, which may be mediated by feedback or buffering in expression networks.We profiled expression in engineered flies where gene dose was reduced from two to one. While expression of most one-dose genes was reduced, the gene-specific dose responses were heterogeneous. Expression of two-dose genes that are first-degree neighbors of one-dose genes in novel network models also changed, and the directionality of change depended on the response of one-dose genes.Our data indicate that expression perturbation propagates in network space. Autosomal compensation, or the lack thereof, is a gene-specific response, largely mediated by interactions with the rest of the transcriptome.
View details for DOI 10.1186/gb-2012-13-4-r28
View details for Web of Science ID 000308544700004
View details for PubMedID 22531030
The determination of stem cell fate by 3D scaffold structures through the control of cell shape
2011; 32 (35): 9188-9196
Stem cell response to a library of scaffolds with varied 3D structures was investigated. Microarray screening revealed that each type of scaffold structure induced a unique gene expression signature in primary human bone marrow stromal cells (hBMSCs). Hierarchical cluster analysis showed that treatments sorted by scaffold structure and not by polymer chemistry suggesting that scaffold structure was more influential than scaffold composition. Further, the effects of scaffold structure on hBMSC function were mediated by cell shape. Of all the scaffolds tested, only scaffolds with a nanofibrous morphology were able to drive the hBMSCs down an osteogenic lineage in the absence of osteogenic supplements. Nanofiber scaffolds forced the hBMSCs to assume an elongated, highly branched morphology. This same morphology was seen in osteogenic controls where hBMSCs were cultured on flat polymer films in the presence of osteogenic supplements (OS). In contrast, hBMSCs cultured on flat polymer films in the absence of OS assumed a more rounded and less-branched morphology. These results indicate that cells are more sensitive to scaffold structure than previously appreciated and suggest that scaffold efficacy can be optimized by tailoring the scaffold structure to force cells into morphologies that direct them to differentiate down the desired lineage.
View details for DOI 10.1016/j.biomaterials.2011.08.054
View details for Web of Science ID 000296684200005
View details for PubMedID 21890197
Synthetic spike-in standards for RNA-seq experiments
2011; 21 (9): 1543-1551
High-throughput sequencing of cDNA (RNA-seq) is a widely deployed transcriptome profiling and annotation technique, but questions about the performance of different protocols and platforms remain. We used a newly developed pool of 96 synthetic RNAs with various lengths, and GC content covering a 2(20) concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts. We observed linearity between read density and RNA input over the entire detection range and excellent agreement between replicates, but we observed significantly larger imprecision than expected under pure Poisson sampling errors. We use the control RNAs to directly measure reproducible protocol-dependent biases due to GC content and transcript length as well as stereotypic heterogeneity in coverage across transcripts correlated with position relative to RNA termini and priming sequence bias. These effects lead to biased quantification for short transcripts and individual exons, which is a serious problem for measurements of isoform abundances, but that can partially be corrected using appropriate models of bias. By using the control RNAs, we derive limits for the discovery and detection of rare transcripts in RNA-seq experiments. By using data collected as part of the model organism and human Encyclopedia of DNA Elements projects (ENCODE and modENCODE), we demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.
View details for DOI 10.1101/gr.121095.111
View details for Web of Science ID 000294477000014
View details for PubMedID 21816910
Contributions of the EMERALD project to assessing and improving microarray data quality
2011; 50 (1): 27-31
While minimum information about a microarray experiment (MIAME) standards have helped to increase the value of the microarray data deposited into public databases like ArrayExpress and Gene Expression Omnibus (GEO), limited means have been available to assess the quality of this data or to identify the procedures used to normalize and transform raw data. The EMERALD FP6 Coordination Action was designed to deliver approaches to assess and enhance the overall quality of microarray data and to disseminate these approaches to the microarray community through an extensive series of workshops, tutorials, and symposia. Tools were developed for assessing data quality and used to demonstrate how the removal of poor-quality data could improve the power of statistical analyses and facilitate analysis of multiple joint microarray data sets. These quality metrics tools have been disseminated through publications and through the software package arrayQualityMetrics. Within the framework provided by the Ontology of Biomedical Investigations, ontology was developed to describe data transformations, and software ontology was developed for gene expression analysis software. In addition, the consortium has advocated for the development and use of external reference standards in microarray hybridizations and created the Molecular Methods (MolMeth) database, which provides a central source for methods and protocols focusing on microarray-based technologies.
View details for DOI 10.2144/000113591
View details for Web of Science ID 000287719600011
View details for PubMedID 21231919
Exploring the use of internal and externalcontrols for assessing microarray technical performance.
BMC research notes
2010; 3: 349-?
The maturing of gene expression microarray technology and interest in the use of microarray-based applications for clinical and diagnostic applications calls for quantitative measures of quality. This manuscript presents a retrospective study characterizing several approaches to assess technical performance of microarray data measured on the Affymetrix GeneChip platform, including whole-array metrics and information from a standard mixture of external spike-in and endogenous internal controls. Spike-in controls were found to carry the same information about technical performance as whole-array metrics and endogenous "housekeeping" genes. These results support the use of spike-in controls as general tools for performance assessment across time, experimenters and array batches, suggesting that they have potential for comparison of microarray data generated across species using different technologies.A layered PCA modeling methodology that uses data from a number of classes of controls (spike-in hybridization, spike-in polyA+, internal RNA degradation, endogenous or "housekeeping genes") was used for the assessment of microarray data quality. The controls provide information on multiple stages of the experimental protocol (e.g., hybridization, RNA amplification). External spike-in, hybridization and RNA labeling controls provide information related to both assay and hybridization performance whereas internal endogenous controls provide quality information on the biological sample. We find that the variance of the data generated from the external and internal controls carries critical information about technical performance; the PCA dissection of this variance is consistent with whole-array quality assessment based on a number of quality assurance/quality control (QA/QC) metrics.These results provide support for the use of both external and internal RNA control data to assess the technical quality of microarray experiments. The observed consistency amongst the information carried by internal and external controls and whole-array quality measures offers promise for rationally-designed control standards for routine performance monitoring of multiplexed measurement platforms.
View details for DOI 10.1186/1756-0500-3-349
View details for PubMedID 21189145
Image-based feedback control for real-time sorting of microspheres in a microfluidic device
LAB ON A CHIP
2010; 10 (18): 2402-2410
We describe a control system to automatically distribute antibody-functionalized beads to addressable assay chambers within a PDMS microfluidic device. The system used real-time image acquisition and processing to manage the valve states required to sort beads with unit precision. The image processing component of the control system correctly counted the number of beads in 99.81% of images (2689 of 2694), with only four instances of an incorrect number of beads being sorted to an assay chamber, and one instance of inaccurately counted beads being improperly delivered to waste. Post-experimental refinement of the counting script resulted in one counting error in 2694 images of beads (99.96% accuracy). We analyzed a range of operational variables (flow pressure, bead concentration, etc.) using a statistical model to characterize those that yielded optimal sorting speed and efficiency. The integrated device was able to capture, count, and deliver beads at a rate of approximately four per minute so that bead arrays could be assembled in 32 individually addressable assay chambers for eight analytical measurements in duplicate (512 beads total) within 2.5 hours. This functionality demonstrates the successful integration of a robust control system with precision bead handling that is the enabling technology for future development of a highly multiplexed bead-based analytical device.
View details for DOI 10.1039/c004708b
View details for Web of Science ID 000281227300014
View details for PubMedID 20593069
Learning from microarray interlaboratory studies: measures of precision for gene expression
The ability to demonstrate the reproducibility of gene expression microarray results is a critical consideration for the use of microarray technology in clinical applications. While studies have asserted that microarray data can be "highly reproducible" under given conditions, there is little ability to quantitatively compare amongst the various metrics and terminology used to characterize and express measurement performance. Use of standardized conceptual tools can greatly facilitate communication among the user, developer, and regulator stakeholders of the microarray community. While shaped by less highly multiplexed systems, measurement science (metrology) is devoted to establishing a coherent and internationally recognized vocabulary and quantitative practice for the characterization of measurement processes.The two independent aspects of the metrological concept of "accuracy" are "trueness" (closeness of a measurement to an accepted reference value) and "precision" (the closeness of measurement results to each other). A carefully designed collaborative study enables estimation of a variety of gene expression measurement precision metrics: repeatability, several flavors of intermediate precision, and reproducibility. The three 2004 Expression Analysis Pilot Proficiency Test collaborative studies, each with 13 to 16 participants, provide triplicate microarray measurements on each of two reference RNA pools. Using and modestly extending the consensus ISO 5725 documentary standard, we evaluate the metrological precision figures of merit for individual microarray signal measurement, building from calculations appropriate to single measurement processes, such as technical replicate expression values for individual probes on a microarray, to the estimation and display of precision functions representing all of the probes in a given platform.With only modest extensions, the established metrological framework can be fruitfully used to characterize the measurement performance of microarray and other highly multiplexed systems. Precision functions, summarizing routine precision metrics estimated from appropriately repeated measurements of one or more reference materials as functions of signal level, are demonstrated and merit further development for characterizing measurement platforms, monitoring changes in measurement system performance, and comparing performance among laboratories or analysts.
View details for DOI 10.1186/1471-2164-10-153
View details for Web of Science ID 000266804400003
View details for PubMedID 19356252
Use of Standard Reference Material 2242 (Relative Intensity Correction Standard for Raman Spectroscopy) for microarray scanner qualification
2008; 45 (2): 143-?
As a critical component of any microarray experiment, scanner performance has the potential to contribute variability and bias, the magnitude of which is usually not quantified. Using Standard Reference Material (SRM) 2,242, which is certified for Raman spectral correction, for monitoring the microarray fluorescence at the two most commonly used wavelengths, our team at the National Institute of Standards and Technology (NIST) has developed a method to establish scanner performance, qualifying signal measurement in microarray experiments. SRM 2,242 exhibits the necessary photostability at the excitation wavelengths of 635 nm and 532 nm, which allows scanner signal stability monitoring, although it is not certified for use in this capacity. In the current study, instrument response was tracked day to day, confirming that changes observed in experimental arrays scanned are not due to changes in the scanner response. Signal intensity and signal-to-noise ratio (S/N) were tracked over time on three different scanners, indicating the utility of the SRM for scanner qualification.
View details for DOI 10.2144/000112818
View details for Web of Science ID 000263638000004
View details for PubMedID 18687063
Microarray scanner performance over a five-week period as measured with Cy5 and Cy3 serial dilution slides
JOURNAL OF RESEARCH OF THE NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY
2008; 113 (3): 157-174
To investigate scanner performance and guide development of an instrument qualification method, slides with replicates of successive dilutions of cyanine 5 (Cy5) and cyanine 3 (Cy3) dyes (referred to as dye slides) were scanned. The successive dilutions form a dose-response curve from which performance can be assessed. The effects of a variety of factors, including the number of scans and slide storage conditions, on scanner performance over a five-week period were investigated and tracked with time series charts of dye signal intensity, signal-to-noise (S/N), signal background, slope, and limit of detection (LOD). Scanner drift was tracked with a known stable reference material, Standard Reference Material (SRM) 2242. The greatest effect on the figures of merit was the dye, with the Cy5 dye showing signs of degradation after one week of scanning independent of all other factors while the Cy3 dye remained relatively stable. Use of the charts to track scanner performance over time holds promise for development of a method for microarray scanner performance qualification. Although not a prescription for performance qualification, this introductory study provides sufficient information regarding the use of dye slides to enable the user to institute a preliminary test method.
View details for DOI 10.6028/jres.113.012
View details for Web of Science ID 000258573800003
View details for PubMedID 27096118
Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: High-resolution annotation for microarrays
Extracting biological information from high-density Affymetrix arrays is a multi-step process that begins with the accurate annotation of microarray probes. Shortfalls in the original Affymetrix probe annotation have been described; however, few studies have provided rigorous solutions for routine data analysis.Using AceView, a comprehensive human transcript database, we have reannotated the probes by matching them to RNA transcripts instead of genes. Based on this transcript-level annotation, a new probe set definition was created in which every probe in a probe set maps to a common set of AceView gene transcripts. In addition, using artificial data sets we identified that a minimal probe set size of 4 is necessary for reliable statistical summarization. We further demonstrate that applying the new probe set definition can detect specific transcript variants contributing to differential expression and it also improves cross-platform concordance.We conclude that our transcript-level reannotation and redefinition of probe sets complement the original Affymetrix design. Redefinitions introduce probe sets whose sizes may not support reliable statistical summarization; therefore, we advocate using our transcript-level mapping redefinition in a secondary analysis step rather than as a replacement. Knowing which specific transcripts are differentially expressed is important to properly design probe/primer pairs for validation purposes. For convenience, we have created custom chip-description-files (CDFs) and annotation files for our new probe set definitions that are compatible with Bioconductor, Affymetrix Expression Console or third party software.
View details for DOI 10.1186/1471-2105-8-108
View details for Web of Science ID 000245804400002
View details for PubMedID 17394657
Evaluating the quality of data from DNA microarray measurements.
Methods in molecular biology (Clifton, N.J.)
2007; 381: 121-131
Gene expression technology offers great potentials to generate new insights into human disease pathogenesis; however, the data quality remains a major obstacle for realizing its potentials. In the present study 60-mers oligonucleotide target immobilized on coated glass slides were utilized as a model system to investigate parameters, such as target concentration, retention, signal linearity, and fluorescence properties of fluorophores, which likely affect the quality of microarray results. An array calibration slide was used to calibrate an Axon GenePix 4000A scanner and ensure the dynamic range of the instrument. The work is a first step toward our goal of quantitative gene expression measurements.
View details for PubMedID 17984517
- Top-down standards will not serve systems biology NATURE 2006; 440 (7080): 24-24
Standards in gene expression microarray experiments
DNA MICROARRAYS, PART B: DATABASES AND STATISTICS
2006; 411: 63-78
The use of standards in gene expression measurements with DNA microarrays is ubiquitous--they just are not yet the kind of standards that have yielded microarray gene expression profiles that can be readily compared across different studies and different laboratories. They also are not yet enabling microarray measurements of the known, verifiable quality needed so they can be used with confidence in genomic medicine in regulated environments.
View details for DOI 10.1016/S0076-6879(06)11005-8
View details for Web of Science ID 000244506300005
View details for PubMedID 16939786
The external RNA controls consortium: a progress report
2005; 2 (10): 731-734
Standard controls and best practice guidelines advance acceptance of data from research, preclinical and clinical laboratories by providing a means for evaluating data quality. The External RNA Controls Consortium (ERCC) is developing commonly agreed-upon and tested controls for use in expression assays, a true industry-wide standard control.
View details for Web of Science ID 000232998600009
View details for PubMedID 16179916
Single-element solution comparisons with a high-performance inductively coupled plasma optical emission spectrometric method
2001; 73 (20): 4821-4829
A solution-based inductively coupled plasma optical emission spectrometric (ICP-OES) method is described for elemental analysis with relative expanded uncertainties on the order of 0.1% relative. The single-element determinations of 64 different elements are presented, with aggregate performance results for the method and parameters for the determination of each element. The performance observed is superior to that previously reported for ICP-OES, resulting from a suite of technical strategies that exploit the strengths of contemporary spectrometers, address measurement and sample handling noise sources, and permit rugged operation with small uncertainty. Taken together, these strategies constitute high-performance ICP-OES.
View details for Web of Science ID 000171696800010
View details for PubMedID 11681457
Using inductively coupled plasma-mass spectrometry for calibration transfer between environmental CRMs
FRESENIUS JOURNAL OF ANALYTICAL CHEMISTRY
2001; 370 (2-3): 259-263
Multielement analyses of environmental reference materials have been performed using existing certified reference materials (CRMs) as calibration standards for inductively coupled plasma-mass spectrometry. The analyses have been performed using a high-performance methodology that results in comparison measurement uncertainties that are significantly less than the uncertainties of the certified values of the calibration CRM. Consequently, the determined values have uncertainties that are very nearly equivalent to the uncertainties of the calibration CRM. Several uses of this calibration transfer are proposed, including, re-certification measurements of replacement CRMs, establishing traceability of one CRM to another, and demonstrating the equivalence of two CRMs. RM 8704, a river sediment, was analyzed using SRM 2704, Buffalo River Sediment, as the calibration standard. SRM 1632c, Trace Elements in Bituminous Coal, which is a replacement for SRM 1632b, was analyzed using SRM 1632b as the standard. SRM 1635, Trace Elements in Subbituminous Coal, was also analyzed using SRM 1632b as the standard.
View details for Web of Science ID 000169547300031
View details for PubMedID 11451248
An ICP-OES method with 0.2% expanded uncertainties for the characterization of LiA1O2
2000; 72 (15): 3504–11
An improved inductively coupled plasma-optical emission spectrometry (ICP-OES) method has been applied to the determination of Li and A1 mass fractions and the Li/A1 amount-of-substance ratio in representative samples of LiA1O2. This ICP-OES method has uncertainty on the order of 0.2%,(2,3) comparable to the best analytical methods. This method is based on several strategies, which are detailed in this work. The mean measured mass fractions of Li and A1 in eight samples were 0.10151 +/- 0.00016 (+/-0.16%) and 0.41068 +/- 0.00056 (+/-0.14%), and the mean Li/A1 amount-of-substance ratio was 0.9793 +/- 0.0017 (+/-0.17%). The uncertainty is dominated by sample handling and heterogeneity-about a factor of 2 larger than the ICP-OES instrumental uncertainties, which were 0.04% for A1 and 0.07% for Li.
View details for PubMedID 10952535
A drift correction procedure
1998; 70 (15): 3184-3190
A procedure is introduced that can mitigate the deleterious effect of low-frequency noise [Formula: see text] often termed drift [Formula: see text] on the precision of an analytical experiment. This procedure offers several performance benefits over traditional designs based on the periodic measurement of standards to diagnose and correct for variation in instrument response. Using repeated measurements of every sample as a drift diagnostic, as opposed to requiring the periodic measurement of any given sample or standard, the analyst can better budget the measurement time to be devoted to each sample, distributing it to optimize the uncertainty of the analytical result. The drift is diagnosed from the repeated measurements, a model of the instrument response drift is constructed, and the data are corrected to a "drift-free" condition. This drift-free condition allows data to be accumulated over long periods of time with little or no loss in precision due to drift. More than 10-fold precision enhancements of analytical atomic emission results have been observed, with no statistically significant effects on the means. The procedure is described, performance data are presented, and matters regarding the procedure are discussed.
View details for Web of Science ID 000075232000022
View details for PubMedID 21644656
Practical wavelength calibration considerations for UV-visible Fourier-transform spectroscopy
1996; 35 (16): 2960-2970
The intrinsic wavelength scale in a modern reference laser-controlled Michelson interferometer-sometimes referred to as the Connes advantage-offers excellent wavelength accuracy with relative ease. Truly superb wavelength accuracy, with total relative uncertainty in line position of the order of several parts in 10(8), should be within reach with single-point, multiplicative calibration. The need for correction of the wavelength scale arises from two practical effects: the use of a finite aperture, from which off-axis rays propagate through the interferometer, and imperfect geometric alignment of the sample beam with the reference beam and the optical axis of the moving mirror. Although an analytical correction can be made for the finite-aperture effect, calibration with a trusted wavelength standard is typically used to accomplish both corrections. Practical aspects of accurate calibration of an interferometer in the UV-visible region are discussed. Critical issues regarding accurate use of a standard external to the sample source and the evaluation and selection of an appropriate standard are addressed. Anomalous results for two different potential wavelength standards measured by Fabry-Perot interferometry (Ar II and (198)Hg I) are observed.
View details for Web of Science ID A1996UQ48200030
View details for PubMedID 21085448
Wavelengths of spectral lines in mercury pencil lamps
1996; 35 (1): 74-77
The wavelengths of 19 spectral lines in the region 253-579 nm emitted by Hg pencil-type lamps were measured by Fourier-transform spectroscopy. Precise calibration of the spectra was obtained with wavelengths of (198)Hg as external standards. Our recommended values should be useful aswavelength-calibration standards for moderate-resolution spectrometers at an uncertainty level of 0.0001 nm.
View details for Web of Science ID A1996TR77300009
View details for PubMedID 21068979