Academic Appointments

  • Adjunct Professor, Symbolic Systems

Professional Education

  • PostDoc, Carnegie Inst. of Science, Dept. of Plant Biology, Genomics of Photosynthetic Microbes (2008)
  • PostDoc, U Pitt, Cognitive Neuroscience (1998)
  • PostDoc, CMU, Developmental Cognitive Neuroscience (1995)
  • PhD, CMU, Cogntitive and Developmental Psychology (1985)
  • MSE, U Penn, Artificial Intelligence (1981)
  • BSE, U Penn, Computer Science (1980)

2020-21 Courses

All Publications

  • Is Cancer Solvable? Towards Efficient and Ethical Biomedical Science. The Journal of law, medicine & ethics : a journal of the American Society of Law, Medicine & Ethics Shrager, J., Shapiro, M., Hoos, W. 2019; 47 (3): 362–68


    Global Cumulative Treatment Analysis (GCTA) is a novel clinical research model combining expert knowledge, and treatment coordination based upon global information-gain, to treat every patient optimally while efficiently searching the vast space that is the realm of cancer research.

    View details for DOI 10.1177/1073110519876164

    View details for PubMedID 31560637

  • Prototyping a precision oncology 3.0 rapid learning platform BMC BIOINFORMATICS Sweetnam, C., Mocellin, S., Krauthammer, M., Knopf, N., Baertsch, R., Shrager, J. 2018; 19: 341


    We describe a prototype implementation of a platform that could underlie a Precision Oncology Rapid Learning system.We describe the prototype platform, and examine some important issues and details. In the Appendix we provide a complete walk-through of the prototype platform.The design choices made in this implementation rest upon ten constitutive hypotheses, which, taken together, define a particular view of how a rapid learning medical platform might be defined, organized, and implemented.

    View details for PubMedID 30257653

  • Precision medicine: Fantasy meets reality SCIENCE Shrager, J. 2016; 353 (6305): 1216–17

    View details for PubMedID 27634516

  • Rapid learning for precision oncology NATURE REVIEWS CLINICAL ONCOLOGY Shrager, J., Tenenbaum, J. M. 2014; 11 (2): 109-118


    The emerging paradigm of Precision Oncology 3.0 uses panomics and sophisticated methods of statistical reverse engineering to hypothesize the putative networks that drive a given patient's tumour, and to attack these drivers with combinations of targeted therapies. Here, we review a paradigm termed Rapid Learning Precision Oncology wherein every treatment event is considered as a probe that simultaneously treats the patient and provides an opportunity to validate and refine the models on which the treatment decisions are based. Implementation of Rapid Learning Precision Oncology requires overcoming a host of challenges that include developing analytical tools, capturing the information from each patient encounter and rapidly extrapolating it to other patients, coordinating many patient encounters to efficiently search for effective treatments, and overcoming economic, social and structural impediments, such as obtaining access to, and reimbursement for, investigational drugs.

    View details for DOI 10.1038/nrclinonc.2013.244

    View details for Web of Science ID 000331144600011

    View details for PubMedID 24445514

  • Adding Individual Patient Case Data to The Melanoma Targeted Therapy Advisor 7th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) Stevovici, J., Maxhuni, A., Khaghanifar, I., Shrager, J., Convertino, G., Gobbel, R. IEEE. 2013: 85–88
  • Blend me in: Privacy-Preserving Input Generalization for Personalized Online Services 11th Annual International Conference on Privacy, Security and Trust (PST) Baquero, A., Schiffman, A. M., Shrager, J. IEEE. 2013: 51–60
  • A Novel Classification of Lung Cancer into Molecular Subtypes PLOS ONE West, L., Vidwans, S. J., Campbell, N. P., Shrager, J., Simon, G. R., Bueno, R., Dennis, P. A., Otterson, G. A., Salgia, R. 2012; 7 (2)


    The remarkably heterogeneous nature of lung cancer has become more apparent over the last decade. In general, advanced lung cancer is an aggressive malignancy with a poor prognosis. The discovery of multiple molecular mechanisms underlying the development, progression, and prognosis of lung cancer, however, has created new opportunities for targeted therapy and improved outcome. In this paper, we define "molecular subtypes" of lung cancer based on specific actionable genetic aberrations. Each subtype is associated with molecular tests that define the subtype and drugs that may potentially treat it. We hope this paper will be a useful guide to clinicians and researchers alike by assisting in therapy decision making and acting as a platform for further study. In this new era of cancer treatment, the 'one-size-fits-all' paradigm is being forcibly pushed aside-allowing for more effective, personalized oncologic care to emerge.

    View details for DOI 10.1371/journal.pone.0031906

    View details for Web of Science ID 000302873700120

    View details for PubMedID 22363766

    View details for PubMedCentralID PMC3283716

  • Cancer: A Computational Disease That AI Can Cure AI MAGAZINE Tenenbaum, J. M., Shrager, J. 2011; 32 (2): 14-26
  • A Melanoma Molecular Disease Model PLOS ONE Vidwans, S. J., Flaherty, K. T., Fisher, D. E., Tenenbaum, J. M., Travers, M. D., Shrager, J. 2011; 6 (3)


    While advanced melanoma remains one of the most challenging cancers, recent developments in our understanding of the molecular drivers of this disease have uncovered exciting opportunities to guide personalized therapeutic decisions. Genetic analyses of melanoma have uncovered several key molecular pathways that are involved in disease onset and progression, as well as prognosis. These advances now make it possible to create a "Molecular Disease Model" (MDM) for melanoma that classifies individual tumors into molecular subtypes (in contrast to traditional histological subtypes), with proposed treatment guidelines for each subtype including specific assays, drugs, and clinical trials. This paper describes such a Melanoma Molecular Disease Model reflecting the latest scientific, clinical, and technological advances.

    View details for DOI 10.1371/journal.pone.0018257

    View details for Web of Science ID 000289055700041

    View details for PubMedID 21479172

    View details for PubMedCentralID PMC3068163

  • Responses of psbA, hli and ptox genes to changes in irradiance in marine Synechococcus and Prochlorococcus AQUATIC MICROBIAL ECOLOGY Berg, G. M., Shrager, J., van Dijken, G., Mills, M. M., Arrigo, K. R., Grossman, A. R. 2011; 65 (1): 1-14

    View details for DOI 10.3354/ame01528

    View details for Web of Science ID 000297117200001

  • Targeted Therapy Database (TTD): A Model to Match Patient's Molecular Profile with Current Knowledge on Cancer Biology PLOS ONE Mocellin, S., Shrager, J., Scolyer, R., Pasquali, S., Verdi, D., Marincola, F. M., Briarava, M., Gobbel, R., Rossi, C., Nitti, D. 2010; 5 (8)


    The efficacy of current anticancer treatments is far from satisfactory and many patients still die of their disease. A general agreement exists on the urgency of developing molecularly targeted therapies, although their implementation in the clinical setting is in its infancy. In fact, despite the wealth of preclinical studies addressing these issues, the difficulty of testing each targeted therapy hypothesis in the clinical arena represents an intrinsic obstacle. As a consequence, we are witnessing a paradoxical situation where most hypotheses about the molecular and cellular biology of cancer remain clinically untested and therefore do not translate into a therapeutic benefit for patients.To present a computational method aimed to comprehensively exploit the scientific knowledge in order to foster the development of personalized cancer treatment by matching the patient's molecular profile with the available evidence on targeted therapy.To this aim we focused on melanoma, an increasingly diagnosed malignancy for which the need for novel therapeutic approaches is paradigmatic since no effective treatment is available in the advanced setting. Relevant data were manually extracted from peer-reviewed full-text original articles describing any type of anti-melanoma targeted therapy tested in any type of experimental or clinical model. To this purpose, Medline, Embase, Cancerlit and the Cochrane databases were searched.We created a manually annotated database (Targeted Therapy Database, TTD) where the relevant data are gathered in a formal representation that can be computationally analyzed. Dedicated algorithms were set up for the identification of the prevalent therapeutic hypotheses based on the available evidence and for ranking treatments based on the molecular profile of individual patients. In this essay we describe the principles and computational algorithms of an original method developed to fully exploit the available knowledge on cancer biology with the ultimate goal of fruitfully driving both preclinical and clinical research on anticancer targeted therapy. In the light of its theoretical nature, the prediction performance of this model must be validated before it can be implemented in the clinical setting.

    View details for DOI 10.1371/journal.pone.0011965

    View details for Web of Science ID 000280811900001

    View details for PubMedID 20706624

    View details for PubMedCentralID PMC2919374

  • The Promise and Perils of Pre-Publication Review: A Multi-Agent Simulation of Biomedical Discovery Under Varying Levels of Review Stringency PLOS ONE Shrager, J. 2010; 5 (5)


    The Internet has enabled profound changes in the way science is performed, especially in scientific communications. Among the most important of these changes is the possibility of new models for pre-publication review, ranging from the current, relatively strict peer-review model, to entirely unreviewed, instant self-publication. Different models may affect scientific progress by altering both the quality and quantity of papers available to the research community. To test how models affect the community, I used a multi-agent simulation of treatment selection and outcome in a patient population to examine how various levels of pre-publication review might affect the rate of scientific progress. I identified a "sweet spot" between the points of very limited and very strict requirements for pre-publication review. The model also produced a u-shaped curve where very limited review requirement was slightly superior to a moderate level of requirement, but not as large as the aforementioned sweet spot. This unexpected phenomenon appears to result from the community taking longer to discover the correct treatment with more strict pre-publication review. In the parameter regimens I explored, both completely unreviewed and very strictly reviewed scientific communication seems likely to hinder scientific progress. Much more investigation is warranted. Multi-agent simulations can help to shed light on complex questions of scientific communication and exhibit interesting, unexpected behaviors.

    View details for DOI 10.1371/journal.pone.0010782

    View details for PubMedID 20520812

  • Soccer Science and the Bayes Community: Exploring the Cognitive Implications of Modern Scientific Communication TOPICS IN COGNITIVE SCIENCE Shrager, J., Billman, D., Convertino, G., Massar, J. P., Pirolli, P. 2010; 2 (1): 53-72


    Science is a form of distributed analysis involving both individual work that produces new knowledge and collaborative work to exchange information with the larger community. There are many particular ways in which individual and community can interact in science, and it is difficult to assess how efficient these are, and what the best way might be to support them. This paper reports on a series of experiments in this area and a prototype implementation using a research platform called CACHE. CACHE both supports experimentation with different structures of interaction between individual and community cognition and serves as a prototype for computational support for those structures. We particularly focus on CACHE-BC, the Bayes community version of CACHE, within which the community can break up analytical tasks into "mind-sized" units and use provenance tracking to keep track of the relationship between these units.

    View details for DOI 10.1111/j.1756-8765.2009.01049.x

    View details for Web of Science ID 000283866800006

  • Soccer science and the Bayes community: exploring the cognitive implications of modern scientific communication. Topics in cognitive science Shrager, J., Billman, D., Convertino, G., Massar, J. P., Pirolli, P. 2010; 2 (1): 53–72


    Science is a form of distributed analysis involving both individual work that produces new knowledge and collaborative work to exchange information with the larger community. There are many particular ways in which individual and community can interact in science, and it is difficult to assess how efficient these are, and what the best way might be to support them. This paper reports on a series of experiments in this area and a prototype implementation using a research platform called CACHE. CACHE both supports experimentation with different structures of interaction between individual and community cognition and serves as a prototype for computational support for those structures. We particularly focus on CACHE-BC, the Bayes community version of CACHE, within which the community can break up analytical tasks into "mind-sized" units and use provenance tracking to keep track of the relationship between these units.

    View details for PubMedID 25163621

  • BioBIKE: A Web-based, programmable, integrated biological knowledge base NUCLEIC ACIDS RESEARCH Elhai, J., Taton, A., Massar, J. P., Myers, J. K., Travers, M., Casey, J., Slupesky, M., Shrager, J. 2009; 37: W28-W32


    BioBIKE ( is a web-based environment enabling biologists with little programming expertise to combine tools, data, and knowledge in novel and possibly complex ways, as demanded by the biological problem at hand. BioBIKE is composed of three integrated components: a biological knowledge base, a graphical programming interface and an extensible set of tools. Each of the five current BioBIKE instances provides all available information (genomic, metabolic, experimental) appropriate to a given research community. The BioBIKE programming language and graphical programming interface employ familiar operations to help users combine functions and information to conduct biologically meaningful analyses. Many commonly used tools, such as Blast and PHYLIP, are built-in, allowing users to access them within the same interface and to pass results from one to another. Users may also invent their own tools, packaging complex expressions under a single name, which is immediately made accessible through the graphical interface. BioBIKE represents a partial solution to the difficult question of how to enable those with no background in computer programming to work directly and creatively with mass biological information. BioBIKE is distributed under the MIT Open Source license. A description of the underlying language and other technical matters is available at

    View details for DOI 10.1093/nar/gkp354

    View details for Web of Science ID 000267889100007

    View details for PubMedID 19433511

    View details for PubMedCentralID PMC2703918

  • Understanding nitrogen limitation in Aureococcus anophagefferens (Pelagophyceae) through cDNA and qRT-PCR analysis JOURNAL OF PHYCOLOGY Berg, G. M., Shrager, J., Gloeckner, G., Arrigo, K. R., Grossman, A. R. 2008; 44 (5): 1235-1249


    Brown tides of the marine pelagophyte Aureococcus anophagefferens Hargraves et Sieburth have been investigated extensively for the past two decades. Its growth is fueled by a variety of nitrogen (N) compounds, with dissolved organic nitrogen (DON) being particularly important during blooms. Characterization of a cDNA library suggests that A. anophagefferens can assimilate eight different forms of N. Expression of genes related to the sensing, uptake, and assimilation of inorganic and organic N, as well as the catabolic process of autophagy, was assayed in cells grown on different N sources and in N-limited cells. Growth on nitrate elicited an increase in the relative expression of nitrate and ammonium transporters, a nutrient stress-induced transporter, and a sensory kinase. Growth on urea increased the relative expression of a urea and a formate/nitrite transporter, while growth on ammonium resulted in an increase in the relative expression of an ammonium transporter, a novel ATP-binding cassette (ABC) transporter and a putative high-affinity phosphate transporter. N limitation resulted in a 30- to 110-fold increase in the relative expression of nitrate, ammonium, urea, amino acid/polyamine, and formate/nitrite transporters. A. anophagefferens demonstrated the highest relative accumulation of a transcript encoding a novel purine transporter, which was highly expressed across all N sources. This finding suggests that purines are an important source of N for the growth of this organism and could possibly contribute to the initiation and maintenance of blooms in the natural environment.

    View details for DOI 10.1111/j.1529-8817.2008.00571.x

    View details for Web of Science ID 000259866800015

  • Keeping the collectivity in mind? PHENOMENOLOGY AND THE COGNITIVE SCIENCES Collins, H., Clark, A., Shrager, J. 2008; 7 (3): 353-374
  • The CACHE Study: Group Effects in Computer-supported Collaborative Analysis COMPUTER SUPPORTED COOPERATIVE WORK-THE JOURNAL OF COLLABORATIVE COMPUTING Convertino, G., Billman, D., Pirolli, P., Massar, J. P., Shrager, J. 2008; 17 (4): 353-393
  • Alternative photosynthetic electron flow to oxygen in marine Synechococcus BIOCHIMICA ET BIOPHYSICA ACTA-BIOENERGETICS Bailey, S., Melis, A., Mackey, K. R., Cardol, P., Finazzi, G., van Dijken, G., Berg, G. M., Arrigo, K., Shrager, J., Grossman, A. 2008; 1777 (3): 269-276


    Cyanobacteria dominate the world's oceans where iron is often barely detectable. One manifestation of low iron adaptation in the oligotrophic marine environment is a decrease in levels of iron-rich photosynthetic components, including the reaction center of photosystem I and the cytochrome b6f complex [R.F. Strzepek and P.J. Harrison, Photosynthetic architecture differs in coastal and oceanic diatoms, Nature 431 (2004) 689-692.]. These thylakoid membrane components have well characterised roles in linear and cyclic photosynthetic electron transport and their low abundance creates potential impediments to photosynthetic function. Here we show that the marine cyanobacterium Synechococcus WH8102 exhibits significant alternative electron flow to O2, a potential adaptation to the low iron environment in oligotrophic oceans. This alternative electron flow appears to extract electrons from the intersystem electron transport chain, prior to photosystem I. Inhibitor studies demonstrate that a propyl gallate-sensitive oxidase mediates this flow of electrons to oxygen, which in turn alleviates excessive photosystem II excitation pressure that can often occur even at relatively low irradiance. These findings are also discussed in the context of satisfying the energetic requirements of the cell when photosystem I abundance is low.

    View details for DOI 10.1016/j.bbabio.2008.01.002

    View details for Web of Science ID 000254674600004

    View details for PubMedID 18241667

  • Taskpose: Exploring Fluid Boundaries in an Associative Window Visualization 21st Annual ACM Symposium on User Interface Software and Technology Bernstein, M., Shrager, J., Winograd, T. ASSOC COMPUTING MACHINERY. 2008: 231–234
  • The evolution of BioBike: Community adaptation of a biocomputing platform STUDIES IN HISTORY AND PHILOSOPHY OF SCIENCE Shrager, J. 2007; 38 (4): 642-656
  • Deductive Biocomputing PLOS ONE Shrager, J., Waldinger, R., Stickel, M., Massar, J. P. 2007; 2 (4)


    As biologists increasingly rely upon computational tools, it is imperative that they be able to appropriately apply these tools and clearly understand the methods the tools employ. Such tools must have access to all the relevant data and knowledge and, in some sense, "understand" biology so that they can serve biologists' goals appropriately and "explain" in biological terms how results are computed.We describe a deduction-based approach to biocomputation that semiautomatically combines knowledge, software, and data to satisfy goals expressed in a high-level biological language. The approach is implemented in an open source web-based biocomputing platform called BioDeducta, which combines SRI's SNARK theorem prover with the BioBike interactive integrated knowledge base. The biologist/user expresses a high-level conjecture, representing a biocomputational goal query, without indicating how this goal is to be achieved. A subject domain theory, represented in SNARK's logical language, transforms the terms in the conjecture into capabilities of the available resources and the background knowledge necessary to link them together. If the subject domain theory enables SNARK to prove the conjecture--that is, to find paths between the goal and BioBike resources--then the resulting proofs represent solutions to the conjecture/query. Such proofs provide provenance for each result, indicating in detail how they were computed. We demonstrate BioDeducta by showing how it can approximately replicate a previously published analysis of genes involved in the adaptation of cyanobacteria to different light niches.Through the use of automated deduction guided by a biological subject domain theory, this work is a step towards enabling biologists to conveniently and efficiently marshal integrated knowledge, data, and computational tools toward resolving complex biological queries.

    View details for DOI 10.1371/journal.pone.0000339

    View details for Web of Science ID 000207445300001

    View details for PubMedID 17415407

    View details for PubMedCentralID PMC1838522

  • EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome NUCLEIC ACIDS RESEARCH Jain, M., Shrager, J., Harris, E. H., Halbrook, R., Grossman, A. R., Hauser, C., Vallon, O. 2007; 35 (6): 2074-2083


    Clustering and assembly of expressed sequence tags (ESTs) constitute the basis for most genomewide descriptions of a transcriptome. This approach is limited by the decline in sequence quality toward the end of each EST, impacting both sequence clustering and assembly. Here, we exploit the available draft genome sequence of the unicellular green alga Chlamydomonas reinhardtii to guide clustering and to correct errors in the ESTs. We have grouped all available EST and cDNA sequences into 12,063 ACEGs (assembly of contiguous ESTs based on genome) and generated 15,857 contigs of average length 934 nt. We predict that roughly 3000 of our contigs represent full-length transcripts. Compared to previous assemblies, ACEGs show extended contig length, increased accuracy and a reduction in redundancy. Because our assembly protocol also uses ESTs with no corresponding genomic sequences, it provides sequence information for genes interrupted by sequence gaps. Detailed analysis of randomly sampled ACEGs reveals several hundred putative cases of alternative splicing, many overlapping transcription units and new genes not identified by gene prediction algorithms. Our protocol, although developed for and tailored to the C. reinhardtii dataset, can be exploited by any eukaryotic genome project for which both a draft genome sequence and ESTs are available.

    View details for DOI 10.1093/nar/gkm081

    View details for Web of Science ID 000246123600038

    View details for PubMedID 17355987

    View details for PubMedCentralID PMC1874618

  • Constructing explanatory process models from biological data and knowledge ARTIFICIAL INTELLIGENCE IN MEDICINE Langley, P., Shiran, O., Shrager, J., Todorovski, L., Pohorille, A. 2006; 37 (3): 191-201


    We address the task of inducing explanatory models from observations and knowledge about candidate biological processes, using the illustrative problem of modeling photosynthesis regulation.We cast both models and background knowledge in terms of processes that interact to account for behavior. We also describe IPM, an algorithm for inducing quantitative process models from such input.We demonstrate IPM's use both on photosynthesis and on a second domain, biochemical kinetics, reporting the models induced and their fit to observations.We consider the generality of our approach, discuss related research on biological modeling, and suggest directions for future work.

    View details for DOI 10.1016/j.artmed.2006.04.003

    View details for PubMedID 16781850

  • Examination of diel changes in global transcript accumulation in Synechocystis (cyanobacteria) JOURNAL OF PHYCOLOGY Labiosa, R. G., Arrigo, K. R., Tu, C. J., Bhaya, D., Bay, S., Grossman, A. R., Shrager, J. 2006; 42 (3): 622-636
  • Generation of an oligonucleotide array for analysis of gene expression in Chlamydomonas reinhardtii CURRENT GENETICS Eberhard, S., Jain, M., Im, C. S., Pollock, S., Shrager, J., Lin, Y. A., Peek, A. S., Grossman, A. R. 2006; 49 (2): 106-124


    The availability of genome sequences makes it possible to develop microarrays that can be used for profiling gene expression over developmental time, as organisms respond to environmental challenges, and for comparison between wild-type and mutant strains under various conditions. The desired characteristics of microarrays (intense signals, hybridization specificity and extensive coverage of the transcriptome) were not fully met by the previous Chlamydomonas reinhardtii microarray: probes derived from cDNA sequences (approximately 300 bp) were prone to some nonspecific cross-hybridization and coverage of the transcriptome was only approximately 20%. The near completion of the C. reinhardtii nuclear genome sequence and the availability of extensive cDNA information have made it feasible to improve upon these aspects. After developing a protocol for selecting a high-quality unigene set representing all known expressed sequences, oligonucleotides were designed and a microarray with approximately 10,000 unique array elements (approximately 70 bp) covering 87% of the known transcriptome was developed. This microarray will enable researchers to generate a global view of gene expression in C. reinhardtii. Furthermore, the detailed description of the protocol for selecting a unigene set and the design of oligonucleotides may be of interest for laboratories interested in developing microarrays for organisms whose genome sequences are not yet completed (but are nearing completion).

    View details for DOI 10.1007/s00294-005-0041-2

    View details for Web of Science ID 000234906200004

    View details for PubMedID 16333659

  • A hybrid, recursive algorithm for clustering expressed sequence tags in Chlamydomonas reinhardtii 18th International Conference on Pattern Recognition (ICPR 2006) Jain, M., Holz, H., Shrager, J., Vallon, O., Hauser, C., Grossman, A. IEEE COMPUTER SOC. 2006: 404–407
  • BioLingua: a programmable knowledge environment for biologists BIOINFORMATICS Massar, J. P., Travers, M., Elhai, J., Shrager, J. 2005; 21 (2): 199-207


    BioLingua is an interactive, web-based programming environment that enables biologists to analyze biological systems by combining knowledge and data through direct end-user programming. BioLingua embeds a mature symbolic programming language in a frame-based knowledge environment, integrating genomic and pathway knowledge about a class of similar organisms. The BioLingua language provides interfaces to numerous state-of-the-art bioinformatic tools, making these available as an integrated package through the novel use of web-based programmability and an integrated Wiki-based community code and data store. The pilot instantiation of BioLingua, which has been developed in collaboration with several cyanobacteriologists, integrates knowledge about a subset of cyanobacteria with the Gene Ontology, KEGG and BioCyc knowledge bases. We introduce the BioLingua concept, architecture and language, and give several examples of its use in complex analyses.Extensive documentation is available online at

    View details for DOI 10.1093/bioinformatics/bth465

    View details for Web of Science ID 000226308500008

    View details for PubMedID 15308539

  • Temporal aggregation bias and inference of causal regulatory networks JOURNAL OF COMPUTATIONAL BIOLOGY Bay, S. D., Chrisman, L., Pohorille, A., Shrager, J. 2004; 11 (5): 971-985


    Time course experiments with microarrays have begun to provide a glimpse into the dynamic behavior of gene expression. In a typical experiment, scientists use microarrays to measure the abundance of mRNA at discrete time points after the onset of a stimulus. Recently, there has been much work on using these data to infer causal regulatory networks that model how genes influence each other. However, microarray studies typically have slow sampling rates that can lead to temporal aggregation of the signal. That is, each successive sampling point represents the sum of all signal changes since the previous sample. In this paper, we show that temporal aggregation can bias algorithms for causal inference and lead them to discover spurious relations that would not be found if the signal were sampled at a much faster rate. We discuss the implications of temporal aggregation on inference, the problems it creates, and potential directions for solutions.

    View details for Web of Science ID 000225090800011

    View details for PubMedID 15700412

  • Insights into the survival of Chlamydomonas reinhardtii during sulfur starvation based on microarray analysis of gene expression EUKARYOTIC CELL Zhang, Z. D., Shrager, J., Jain, M., Chang, C. W., Vallon, O., Grossman, A. R. 2004; 3 (5): 1331-1348


    Responses of photosynthetic organisms to sulfur starvation include (i) increasing the capacity of the cell for transporting and/or assimilating exogenous sulfate, (ii) restructuring cellular features to conserve sulfur resources, and (iii) modulating metabolic processes and rates of cell growth and division. We used microarray analyses to obtain a genome-level view of changes in mRNA abundances in the green alga Chlamydomonas reinhardtii during sulfur starvation. The work confirms and extends upon previous findings showing that sulfur deprivation elicits changes in levels of transcripts for proteins that help scavenge sulfate and economize on the use of sulfur resources. Changes in levels of transcripts encoding members of the light-harvesting polypeptide family, such as LhcSR2, suggest restructuring of the photosynthetic apparatus during sulfur deprivation. There are also significant changes in levels of transcripts encoding enzymes involved in metabolic processes (e.g., carbon metabolism), intracellular proteolysis, and the amelioration of oxidative damage; a marked and sustained increase in mRNAs for a putative vanadium chloroperoxidase and a peroxiredoxin may help prolong survival of C. reinhardtii during sulfur deprivation. Furthermore, many of the sulfur stress-regulated transcripts (encoding polypeptides associated with sulfate uptake and assimilation, oxidative stress, and photosynthetic function) are not properly regulated in the sac1 mutant of C. reinhardtii, a strain that dies much more rapidly than parental cells during sulfur deprivation. Interestingly, sulfur stress elicits dramatic changes in levels of transcripts encoding putative chloroplast-localized chaperones in the sac1 mutant but not in the parental strain. These results suggest various strategies used by photosynthetic organisms during acclimation to nutrient-limited growth.

    View details for DOI 10.1128/EC.3.5.1331-1348.2004

    View details for Web of Science ID 000224822300027

    View details for PubMedID 15470261

    View details for PubMedCentralID PMC522608

  • Consequences of a deletion in dspA on transcript accumulation in Synechocystis sp strain PCC6803 JOURNAL OF BACTERIOLOGY Tu, C. J., Shrager, J., Burnap, R. L., Postier, B. L., Grossman, A. R. 2004; 186 (12): 3889-3902


    A sensor histidine kinase of Synechococcus sp. strain PCC7942, designated nblS, was previously identified and shown to be critical for the acclimation of cells to high-light and nutrient limitation conditions and to influence the expression of a number of light-responsive genes. The nblS orthologue in Synechocystis sp. strain PCC6803 is designated dspA (also called hik33). We have generated a dspA null mutant and analyzed global gene expression in both the mutant and wild-type strains under high- and low-light conditions. The mutant is aberrant for the expression of many genes encoding proteins critical for photosynthesis, phosphate and carbon acquisition, and the amelioration of stress conditions. Furthermore, transcripts from a number of genes normally detected only during exposure of wild-type cells to high-light conditions become partially constitutive in the low-light-grown dspA mutant. Other genes for which transcripts decline upon exposure of wild-type cells to high light are already lower in the mutant during growth in low light. These results suggest that DspA may influence gene expression in both a positive and a negative manner and that the dspA mutant behaves as if it were experiencing stress conditions (e.g., high-light exposure) even when maintained at near-optimal growth conditions for wild-type cells. This is discussed with respect to the importance of DspA for regulating the responses of the cell to environmental cues.

    View details for Web of Science ID 000221869100026

    View details for PubMedID 15175303

    View details for PubMedCentralID PMC419946

  • Chlamydomonas reinhardtii at the crossroads of genomics EUKARYOTIC CELL Grossman, A. R., Harris, E. E., Hauser, C., Lefebvre, P. A., Martinez, D., Rokhsar, D., Shrager, J., Silflow, C. D., Stern, D., Vallon, O., Zhang, Z. D. 2003; 2 (6): 1137-1150

    View details for DOI 10.1128/EC.2.6.1137-1150.2003

    View details for Web of Science ID 000187363500001

    View details for PubMedID 14665449

    View details for PubMedCentralID PMC326643

  • The fiction of function BIOINFORMATICS Shrager, J. 2003; 19 (15): 1934–36
  • Chlamydomonas reinhardtii genome project. A guide to the generation and use of the cDNA information PLANT PHYSIOLOGY Shrager, J., Hauser, C., Chang, C. W., Harris, E. H., DAVIES, J., McDermott, J., Tamse, R., Zhang, Z. D., Grossman, A. R. 2003; 131 (2): 401-408


    The National Science Foundation-funded Chlamydomonas reinhardtii genome project involves (a) construction and sequencing of cDNAs isolated from cells exposed to various environmental conditions, (b) construction of a high-density cDNA microarray, (c) generation of genomic contigs that are nucleated around specific physical and genetic markers, (d) generation of a complete chloroplast genome sequence and analyses of chloroplast gene expression, and (e) the creation of a Web-based resource that allows for easy access of the information in a format that can be readily queried. Phases of the project performed by the groups at the Carnegie Institution and Duke University involve the generation of normalized cDNA libraries, sequencing of cDNAs, analysis and assembly of these sequences to generate contigs and a set of predicted unique genes, and the use of this information to construct a high-density DNA microarray. In this paper, we discuss techniques involved in obtaining cDNA end-sequence information and the ways in which this information is assembled and analyzed. Descriptions of protocols for preparing cDNA libraries, assembling cDNA sequences and annotating the sequence information are provided (the reader is directed to Web sites for more detailed descriptions of these methods). We also discuss preliminary results in which the different cDNA libraries are used to identify genes that are potentially differentially expressed.

    View details for DOI 10.1104/pp.016899

    View details for Web of Science ID 000181005000003

    View details for PubMedID 12586865

    View details for PubMedCentralID PMC166817

  • Analysis of light and CO2 regulation in Chlamydomonas reinhardtii using genome-wide approaches PHOTOSYNTHESIS RESEARCH Im, C. S., Zhang, Z. D., Shrager, J., Chang, C. W., Grossman, A. R. 2003; 75 (2): 111-125


    Over the past decade new technologies have been developed to elucidate ways in which cells acclimate to environmental change. Many of these techniques have allowed the identification of specific transcripts that change in abundance in response to particular environmental stimuli; such transcripts represent genes that are potentially differentially regulated. Two techniques that foster identification of differentially regulated genes are differential display and expression profiling using high density DNA microarrays. The former technology amplifies cDNA fragments from mRNAs that differentially accumulate under specific environmental conditions, while the latter provides a more global view of changes in gene expression in response to environmental stimuli. Coupling these technologies with the analysis of mutants aberrant for regulatory molecules that participate in acclimation processes will allow the identification of groups of genes controlled by specific regulatory elements. In this article we describe the use of differential display and DNA microarray profiling to examine environmentally-regulated gene expression. We also show specific experiments using the unicellular green alga Chlamydomonas reinhardtii, in which mRNA abundance is evaluated in response to both changing light and CO(2) conditions.

    View details for Web of Science ID 000181506700002

  • Inducing biological models from temporal gene expression data 6th International Conference on Discovery Science Saito, K., George, D., Bay, S., Shrager, J. SPRINGER-VERLAG BERLIN. 2003: 468–469
  • Revising regulatory networks: from expression data to linear causal models JOURNAL OF BIOMEDICAL INFORMATICS Bay, S. D., Shrager, J., Pohorille, A., Langley, P. 2002; 35 (5-6): 289-297


    Discovering the complex regulatory networks that govern mRNA expression is an important but difficult problem. Many current approaches use only expression data from microarrays to infer the likely network structure. However, this ignores much existing knowledge because for a given organism and system under study, a biologist may already have a partial model of gene regulation. We propose a method for revising and improving these initial models, which may be incomplete or partially incorrect, with expression data. We demonstrate our approach by revising a model of photosynthesis regulation proposed by a biologist for Cyanobacteria. Applied to wild type expression data, our system suggested several modifications consistent with biological knowledge. Applied to a mutant strain, our system correctly modified the disabled gene. Power experiments with synthetic data that indicate that reliable revision is feasible even with a small number of samples.

    View details for DOI 10.1016/S1532-0464(03)00031-5

    View details for Web of Science ID 000184879000002

    View details for PubMedID 12968777

  • Guiding revision of regulatory models with expression data. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Shrager, J., Langley, P., Pohorille, A. 2002: 486-497


    BioLingua is a computational system designed to support biologists' efforts to construct models, make predictions, and interpret data. In this paper, we focus on the specific task of revising an initial model of gene regulation based on expression levels from gene microarrays. We describe BioLingua's formalism for representing process models, its method for predicting qualitative correlations from such models, and its use of data to constrain search through the space of revised models. We also report experimental results on revising a model of photosynthetic regulation in Cyanobacteria to better fit expression data for both wild and mutant strains, along with model mutilation studies designed to test our method's robustness. In closing, we discuss related work on representing, discovering, and revising biological models, after which we propose some directions for future research.

    View details for PubMedID 11928501

  • Parent-child collaborative explanations: Methods of identification and analysis JOURNAL OF THE LEARNING SCIENCES CALLANAN, M. A., Shrager, J., Moore, J. L. 1995; 4 (1): 105-129