Academic Appointments

Professional Education

  • PostDoc, Institute for Systems Biology, Proteomics & Systems Biology Mentor: Ruedi Aebersold
  • Ph.D., University of California, Los Angeles, Chemistry & Biochemistry Mentor: David Eisenberg
  • B.S., Washington University in St. Louis, Computer Science & Biochemistry

Current Research and Scholarly Interests

The Mallick lab focuses on translating multi-omic discovery into precision diagnostics. In particular we use tightly integrated computational and experimental, multi-omic approaches to discover the processes underlying how cells behave (or misbehave) and accordingly how cancers develop and grow. We hope that by exploring these processes, and by formalizing our knowledge in predictive mathematical models that we will be able to better identify biomarkers that can be used to detect cancers earlier and describe how they are likely to behave (e.g. aggressive vs indolent, drug sensitive vs responsive).

More specifically, we are working in three focus areas: Cancer Systems Biology, Multi-scale Biomarker Biology and Technology Development. Notably, many of the studies in our group are investigating fundamental physiological processes and thus are generally applicable to a range of cell-types and diseases.

Our group has also been leading the development of ProteoWizard, an open source set of libraries and tools to simplify the process of developing proteomics tools. They read and write the HUPO-PSI mzML standard and have been incorporated into the ISB's transproteomicpipeline!

For more information see

2017-18 Courses

Stanford Advisees

Graduate and Fellowship Programs

All Publications

  • The Impact of Microenvironmental Heterogeneity on the Evolution of Drug Resistance in Cancer Cells. Cancer informatics Mumenthaler, S. M., Foo, J., Choi, N. C., Heise, N., Leder, K., Agus, D. B., Pao, W., Michor, F., Mallick, P. 2015; 14: 19-31


    Therapeutic resistance arises as a result of evolutionary processes driven by dynamic feedback between a heterogeneous cell population and environmental selective pressures. Previous studies have suggested that mutations conferring resistance to epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKI) in non-small-cell lung cancer (NSCLC) cells lower the fitness of resistant cells relative to drug-sensitive cells in a drug-free environment. Here, we hypothesize that the local tumor microenvironment could influence the magnitude and directionality of the selective effect, both in the presence and absence of a drug. Using a combined experimental and computational approach, we developed a mathematical model of preexisting drug resistance describing multiple cellular compartments, each representing a specific tumor environmental niche. This model was parameterized using a novel experimental dataset derived from the HCC827 erlotinib-sensitive and -resistant NSCLC cell lines. We found that, in contrast to in the drug-free environment, resistant cells may hold a fitness advantage compared to parental cells in microenvironments deficient in oxygen and nutrients. We then utilized the model to predict the impact of drug and nutrient gradients on tumor composition and recurrence times, demonstrating that these endpoints are strongly dependent on the microenvironment. Our interdisciplinary approach provides a model system to quantitatively investigate the impact of microenvironmental effects on the evolutionary dynamics of tumor cells.

    View details for DOI 10.4137/CIN.S19338

    View details for PubMedID 26244007

  • A cross-platform toolkit for mass spectrometry and proteomics NATURE BIOTECHNOLOGY Chambers, M. C., MacLean, B., Burke, R., Amodei, D., Ruderman, D. L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J., Hoff, K., Kessner, D., Tasman, N., Shulman, N., Frewen, B., Baker, T. A., Brusniak, M., Paulse, C., Creasy, D., Flashner, L., Kani, K., Moulding, C., Seymour, S. L., Nuwaysir, L. M., Lefebvre, B., Kuhlmann, F., Roark, J., Rainer, P., Detlev, S., Hemenway, T., Huhmer, A., Langridge, J., Connolly, B., Chadick, T., Holly, K., Eckels, J., Deutsch, E. W., Moritz, R. L., Katz, J. E., Agus, D. B., MacCoss, M., Tabb, D. L., Mallick, P. 2012; 30 (10): 918-920

    View details for DOI 10.1038/nbt.2377

    View details for Web of Science ID 000309965500011

    View details for PubMedID 23051804

  • Impact of Protein Stability, Cellular Localization, and Abundance on Proteomic Detection of Tumor-Derived Proteins in Plasma PLOS ONE Fang, Q., Kani, K., Faca, V. M., Zhang, W., Zhang, Q., Jain, A., Hanash, S., Agus, D. B., McIntosh, M. W., Mallick, P. 2011; 6 (7)


    Tumor-derived, circulating proteins are potentially useful as biomarkers for detection of cancer, for monitoring of disease progression, regression and recurrence, and for assessment of therapeutic response. Here we interrogated how a protein's stability, cellular localization, and abundance affect its observability in blood by mass-spectrometry-based proteomics techniques. We performed proteomic profiling on tumors and plasma from two different xenograft mouse models. A statistical analysis of this data revealed protein properties indicative of the detection level in plasma. Though 20% of the proteins identified in plasma were tumor-derived, only 5% of the proteins observed in the tumor tissue were found in plasma. Both intracellular and extracellular tumor proteins were observed in plasma; however, after normalizing for tumor abundance, extracellular proteins were seven times more likely to be detected. Although proteins that were more abundant in the tumor were also more likely to be observed in plasma, the relationship was nonlinear: Doubling the spectral count increased detection rate by only 50%. Many secreted proteins, even those with relatively low spectral count, were observed in plasma, but few low abundance intracellular proteins were observed. Proteins predicted to be stable by dipeptide composition were significantly more likely to be identified in plasma than less stable proteins. The number of tryptic peptides in a protein was not significantly related to the chance of a protein being observed in plasma. Quantitative comparison of large versus small tumors revealed that the abundance of proteins in plasma as measured by spectral count was associated with the tumor size, but the relationship was not one-to-one; a 3-fold decrease in tumor size resulted in a 16-fold decrease in protein abundance in plasma. This study provides quantitative support for a tumor-derived marker prioritization strategy that favors secreted and stable proteins over all but the most abundant intracellular proteins.

    View details for DOI 10.1371/journal.pone.0023090

    View details for Web of Science ID 000293286500074

    View details for PubMedID 21829587

  • Computational prediction of proteotypic peptides for quantitative proteomics. Nature biotechnology Mallick, P., Schirle, M., Chen, S. S., Flory, M. R., Lee, H., Martin, D., Ranish, J., Raught, B., Schmitt, R., Werner, T., Kuster, B., Aebersold, R. 2007; 25 (1): 125-131


    Mass spectrometry-based quantitative proteomics has become an important component of biological and clinical research. Although such analyses typically assume that a protein's peptide fragments are observed with equal likelihood, only a few so-called 'proteotypic' peptides are repeatedly and consistently identified for any given protein present in a mixture. Using >600,000 peptide identifications generated by four proteomic platforms, we empirically identified >16,000 proteotypic peptides for 4,030 distinct yeast proteins. Characteristic physicochemical properties of these peptides were used to develop a computational tool that can predict proteotypic peptides for any protein from any organism, for a given platform, with >85% cumulative accuracy. Possible applications of proteotypic peptides include validation of protein identifications, absolute quantification of proteins, annotation of coding sequences in genomes, and characterization of the physical principles governing key elements of mass spectrometric workflows (e.g., digestion, chromatography, ionization and fragmentation).

    View details for PubMedID 17195840

  • JUN-Mediated downregulation of EGFR signaling is associated with resistance to gefitinib in EGFR-mutant NSCLC cell lines. Molecular cancer therapeutics Kani, K., Garri, C., Tiemann, K., Malihi, P. D., Punj, V., Nguyen, A. L., Lee, J., Hughes, L. D., Alvarez, R. M., Wood, D. M., Joo, A. Y., Katz, J. E., Agus, D. B., Mallick, P. 2017


    Mutations or deletions in exons 18-21 in the epidermal growth factor receptor (EGFR) are present in approximately 15% of tumors in patients with non-small cell lung cancer (NSCLC). They lead to activation of the EGFR kinase domain and sensitivity to molecularly targeted therapeutics aimed at this domain (gefitinib or erlotinib).  These drugs have demonstrated objective clinical response in many of these patients; however, invariably, all patients acquire resistance.  To examine the molecular origins of resistance, we derived a set of gefitinib resistant cells by exposing lung adenocarcinoma cell line, HCC827, with an activating mutation in the EGFR tyrosine kinase domain, to increasing gefitinib concentrations.  Gefitinib resistant cells acquired an increased expression and activation of JUN, a known oncogene involved in cancer progression.  Ectopic overexpression of JUN in HCC827 cells increased gefitinib IC50 from 49 nM to 8 μM (p < 0.001).  Downregulation of JUN expression through shRNA re-sensitized HCC827 cells to gefitinib (IC50 from 49 nM to 2 nM (p <0.01)).  Inhibitors targeting JUN were three-fold more effective in the gefitinib resistant cells than in the parental cell line (p < .01). Analysis of gene expression in patient tumors with EGFR activating mutations and poor response to erlotinib revealed a similar pattern as the top 260 differentially expressed genes in the gefitinib resistant cells (Spearman correlation coefficient of 0.78, p< 0.01).  These findings suggest that increased JUN expression and activity may contribute to gefitinib resistance in NSCLC and that JUN pathway therapeutics merit investigation as an alternate treatment strategy.

    View details for DOI 10.1158/1535-7163.MCT-16-0564

    View details for PubMedID 28566434

  • A Robust Protocol for Protein Extraction and Digestion. Methods in molecular biology (Clifton, N.J.) Atallah, M., Flory, M. R., Mallick, P. 2017; 1550: 1-10


    Proteins play a key role in all aspects of cellular homeostasis. Proteomics, the large-scale study of proteins, provides in-depth data on protein properties, including abundances and post-translational modification states, and as such provides a rich avenue for the investigation of biological and disease processes. While proteomic tools such as mass spectrometry have enabled exquisitely sensitive sample analysis, sample preparation remains a critical unstandardized variable that can have a significant impact on downstream data readouts. Consistency in sample preparation and handling is therefore paramount in the collection and analysis of proteomic data.Here we describe methods for performing protein extraction from cell culture or tissues, digesting the isolated protein into peptides via in-solution enzymatic digest, and peptide cleanup with final preparations for analysis via liquid chromatography-mass spectrometry. These protocols have been optimized and standardized for maximum consistency and maintenance of sample integrity.

    View details for DOI 10.1007/978-1-4939-6747-6_1

    View details for PubMedID 28188518

  • Data Conversion with ProteoWizard msConvert. Methods in molecular biology (Clifton, N.J.) Adusumilli, R., Mallick, P. 2017; 1550: 339-368


    Recent advances in proteome informatics have led to an explosion in tools to analyze mass spectrometry data. These tools operate across the analysis pipeline doing everything from assessing quality control to matching peptides to spectra to quantification. Unfortunately, the vast majority of these tools are not able to operate directly on the proprietary formats generated by the diverse mass spectrometers. Consequently, the first step in many protocols is the conversion of data from vendor-specific binary files to open-format files. This protocol details the use of ProteoWizard's msConvert and msConvertGUI software for this conversion, taking format features, coding options, and vendor particularities into account. We specifically describe the various options available when doing conversions and the implications of each option.

    View details for DOI 10.1007/978-1-4939-6747-6_23

    View details for PubMedID 28188540

  • Dual transcript and protein quantification in a massive single cell array. Lab on a chip Park, S., Lee, J. Y., Hong, S., Lee, S. H., Dimov, I. K., Lee, H., Suh, S., Pan, Q., Li, K., Wu, A. M., Mumenthaler, S. M., Mallick, P., Lee, L. P. 2016; 16 (19): 3682-3688


    Recently, single-cell molecular analysis has been leveraged to achieve unprecedented levels of biological investigation. However, a lack of simple, high-throughput single-cell methods has hindered in-depth population-wide studies with single-cell resolution. We report a microwell-based cytometric method for simultaneous measurements of gene and protein expression dynamics in thousands of single cells. We quantified the regulatory effects of transcriptional and translational inhibitors on cMET mRNA and cMET protein in cell populations. We studied the dynamic responses of individual cells to drug treatments, by measuring cMET overexpression levels in individual non-small cell lung cancer (NSCLC) cells with induced drug resistance. Across NSCLC cell lines with a given protein expression, distinct patterns of transcript-protein correlation emerged. We believe this platform is applicable for interrogating the dynamics of gene expression, protein expression, and translational kinetics at the single-cell level - a paradigm shift in life science and medicine toward discovering vital cell regulatory mechanisms.

    View details for DOI 10.1039/c6lc00762g

    View details for PubMedID 27546183

    View details for PubMedCentralID PMC5221609

  • Single cell dynamic phenotyping SCIENTIFIC REPORTS Patsch, K., Chiu, C., Engeln, M., Agus, D. B., Mallick, P., Mumenthaler, S. M., Ruderman, D. 2016; 6


    Live cell imaging has improved our ability to measure phenotypic heterogeneity. However, bottlenecks in imaging and image processing often make it difficult to differentiate interesting biological behavior from technical artifact. Thus there is a need for new methods that improve data quality without sacrificing throughput. Here we present a 3-step workflow to improve dynamic phenotype measurements of heterogeneous cell populations. We provide guidelines for image acquisition, phenotype tracking, and data filtering to remove erroneous cell tracks using the novel Tracking Aberration Measure (TrAM). Our workflow is broadly applicable across imaging platforms and analysis software. By applying this workflow to cancer cell assays, we reduced aberrant cell track prevalence from 17% to 2%. The cost of this improvement was removing 15% of the well-tracked cells. This enabled detection of significant motility differences between cell lines. Similarly, we avoided detecting a false change in translocation kinetics by eliminating the true cause: varied proportions of unresponsive cells. Finally, by systematically seeking heterogeneous behaviors, we detected subpopulations that otherwise could have been missed, including early apoptotic events and pre-mitotic cells. We provide optimized protocols for specific applications and step-by-step guidelines for adapting them to a variety of biological systems.

    View details for DOI 10.1038/srep34785

    View details for Web of Science ID 000384765700001

    View details for PubMedID 27708391

  • Protein biomarkers on tissue as imaged via MALDI mass spectrometry: A systematic approach to study the limits of detection PROTEOMICS van de Ven, S. M., Bemis, K. D., Lau, K., Adusumilli, R., Kota, U., Stolowitz, M., Vitek, O., Mallick, P., Gambhir, S. S. 2016; 16 (11-12): 1660-1669


    MALDI mass spectrometry imaging (MSI) is emerging as a tool for protein and peptide imaging across tissue sections. Despite extensive study, there does not yet exist a baseline study evaluating the potential capabilities for this technique to detect diverse proteins in tissue sections. In this study, we developed a systematic approach for characterizing MALDI-MSI workflows in terms of limits of detection, coefficients of variation, spatial resolution, and the identification of endogenous tissue proteins. Our goal was to quantify these figures of merit for a number of different proteins and peptides, in order to gain more insight in the feasibility of protein biomarker discovery efforts using this technique. Control proteins and peptides were deposited in serial dilutions on thinly sectioned mouse xenograft tissue. Using our experimental setup, coefficients of variation were <30% on tissue sections and spatial resolution was 200 μm (or greater). Limits of detection for proteins and peptides on tissue were in the micromolar to millimolar range. Protein identification was only possible for proteins present in high abundance in the tissue. These results provide a baseline for the application of MALDI-MSI towards the discovery of new candidate biomarkers and a new benchmarking strategy that can be used for comparing diverse MALDI-MSI workflows.

    View details for DOI 10.1002/pmic.201500515

    View details for Web of Science ID 000379049100008

    View details for PubMedID 26970438

  • Epigenetic changes mediated by polycomb repressive complex 2 and E2a are associated with drug resistance in a mouse model of lymphoma GENOME MEDICINE Flinders, C., Lam, L., Rubbi, L., Ferrari, R., Fitz-Gibbon, S., Chen, P., Thompson, M., Christofk, H., Agus, D. B., Ruderman, D., Mallick, P., Pellegrini, M. 2016; 8


    The genetic origins of chemotherapy resistance are well established; however, the role of epigenetics in drug resistance is less well understood. To investigate mechanisms of drug resistance, we performed systematic genetic, epigenetic, and transcriptomic analyses of an alkylating agent-sensitive murine lymphoma cell line and a series of resistant lines derived by drug dose escalation.Dose escalation of the alkylating agent mafosfamide was used to create a series of increasingly drug-resistant mouse Burkitt's lymphoma cell lines. Whole genome sequencing, DNA microarrays, reduced representation bisulfite sequencing, and chromatin immunoprecipitation sequencing were used to identify alterations in DNA sequence, mRNA expression, CpG methylation, and H3K27me3 occupancy, respectively, that were associated with increased resistance.Our data suggest that acquired resistance cannot be explained by genetic alterations. Based on integration of transcriptional profiles with transcription factor binding data, we hypothesize that resistance is driven by epigenetic plasticity. We observed that the resistant cells had H3K27me3 and DNA methylation profiles distinct from those of the parental lines. Moreover, we observed DNA methylation changes in the promoters of genes regulated by E2a and members of the polycomb repressor complex 2 (PRC2) and differentially expressed genes were enriched for targets of E2a. The integrative analysis considering H3K27me3 further supported a role for PRC2 in mediating resistance. By integrating our results with data from the Immunological Genome Project (, we showed that these transcriptional changes track the B-cell maturation axis.Our data suggest a novel mechanism of drug resistance in which E2a and PRC2 drive changes in the B-cell epigenome; these alterations attenuate alkylating agent treatment-induced apoptosis.

    View details for DOI 10.1186/s13073-016-0305-0

    View details for Web of Science ID 000375653600001

    View details for PubMedID 27146673

    View details for PubMedCentralID PMC4857420

  • Probabilistic Segmentation of Mass Spectrometry (MS) Images Helps Select Important Ions and Characterize Confidence in the Resulting Segments MOLECULAR & CELLULAR PROTEOMICS Bemis, K. D., Harry, A., Eberlin, L. S., Ferreira, C. R., van de Ven, S. M., Mallick, P., Stolowitz, M., Vitek, O. 2016; 15 (5): 1761-1772


    Mass spectrometry imaging is a powerful tool for investigating the spatial distribution of chemical compounds in a biological sample such as tissue. Two common goals of these experiments are unsupervised segmentation of images into newly discovered homogeneous segments, and supervised classification of images into pre-defined classes. In both cases, the important secondary goals are to characterize the uncertainty associated with the segmentation and with the classification, and to characterize the spectral features that define each segment or class. Recent analysis methods have focused on the spatial structure of the data to improve results. However, they either do not address these secondary goals, or do this with separate \textit{post hoc} procedures.} \rev{We introduce \textit{spatial shrunken centroids}, a statistical model-based framework for both supervised classification and unsupervised segmentation. It takes as input sets of previously detected, aligned, quantified and normalized spectral features, and expresses both spatial and multivariate nature of the data using probabilistic modeling. It selects informative subsets of spectral features that define each unsupervised segment or supervised class, and quantifies and visualizes the uncertainty in spatial segmentations and in tissue classification. In the unsupervised setting, it also guides the choice of an appropriate number of segments. We demonstrate the usefulness of this framework in a supervised human renal cell carcinoma experimental dataset, and several unsupervised experimental datasets, including a pig fetus cross-section, three rodent brains, and a controlled image with known ground truth. This framework is available for use within the open-source R package \textbf{Cardinal}, as part of a full pipeline for the processing, visualization, and statistical analysis of mass spectrometry imaging experiments.

    View details for DOI 10.1074/mcp.O115.053918

    View details for Web of Science ID 000375686100020

    View details for PubMedID 26796117

  • AshwaMAX and Withaferin A inhibits gliomas in cellular and murine orthotopic models JOURNAL OF NEURO-ONCOLOGY Chang, E., Pohling, C., Natarajan, A., Witney, T. H., Kaur, J., Xu, L., Gowrishankar, G., D'Souza, A. L., Murty, S., Schick, S., Chen, L., Wu, N., Khaw, P., Mischel, P., Abbasi, T., Usmani, S., Mallick, P., Gambhir, S. S. 2016; 126 (2): 253-264


    Glioblastoma multiforme (GBM) is an aggressive, malignant cancer Johnson and O'Neill (J Neurooncol 107: 359-364, 2012). An extract from the winter cherry plant (Withania somnifera ), AshwaMAX, is concentrated (4.3 %) for Withaferin A; a steroidal lactone that inhibits cancer cells Vanden Berghe et al. (Cancer Epidemiol Biomark Prev 23: 1985-1996, 2014). We hypothesized that AshwaMAX could treat GBM and that bioluminescence imaging (BLI) could track oral therapy in orthotopic murine models of glioblastoma. Human parietal-cortical glioblastoma cells (GBM2, GBM39) were isolated from primary tumors while U87-MG was obtained commercially. GBM2 was transduced with lentiviral vectors that express Green Fluorescent Protein (GFP)/firefly luciferase fusion proteins. Mutational, expression and proliferative status of GBMs were studied. Intracranial xenografts of glioblastomas were grown in the right frontal regions of female, nude mice (n = 3-5 per experiment). Tumor growth was followed through BLI. Neurosphere cultures (U87-MG, GBM2 and GBM39) were inhibited by AshwaMAX at IC50 of 1.4, 0.19 and 0.22 µM equivalent respectively and by Withaferin A with IC50 of 0.31, 0.28 and 0.25 µM respectively. Oral gavage, every other day, of AshwaMAX (40 mg/kg per day) significantly reduced bioluminescence signal (n = 3 mice, p < 0.02, four parameter non-linear regression analysis) in preclinical models. After 30 days of treatment, bioluminescent signal increased suggesting onset of resistance. BLI signal for control, vehicle-treated mice increased and then plateaued. Bioluminescent imaging revealed diffuse growth of GBM2 xenografts. With AshwaMAX, GBM neurospheres collapsed at nanomolar concentrations. Oral treatment studies on murine models confirmed that AshwaMAX is effective against orthotopic GBM. AshwaMAX is thus a promising candidate for future clinical translation in patients with GBM.

    View details for DOI 10.1007/s11060-015-1972-1

    View details for Web of Science ID 000368728300005

  • A high-content image-based method for quantitatively studying context-dependent cell population dynamics. Scientific reports Garvey, C. M., Spiller, E., Lindsay, D., Chiang, C., Choi, N. C., Agus, D. B., Mallick, P., Foo, J., Mumenthaler, S. M. 2016; 6: 29752-?


    Tumor progression results from a complex interplay between cellular heterogeneity, treatment response, microenvironment and heterocellular interactions. Existing approaches to characterize this interplay suffer from an inability to distinguish between multiple cell types, often lack environmental context, and are unable to perform multiplex phenotypic profiling of cell populations. Here we present a high-throughput platform for characterizing, with single-cell resolution, the dynamic phenotypic responses (i.e. morphology changes, proliferation, apoptosis) of heterogeneous cell populations both during standard growth and in response to multiple, co-occurring selective pressures. The speed of this platform enables a thorough investigation of the impacts of diverse selective pressures including genetic alterations, therapeutic interventions, heterocellular components and microenvironmental factors. The platform has been applied to both 2D and 3D culture systems and readily distinguishes between (1) cytotoxic versus cytostatic cellular responses; and (2) changes in morphological features over time and in response to perturbation. These important features can directly influence tumor evolution and clinical outcome. Our image-based approach provides a deeper insight into the cellular dynamics and heterogeneity of tumors (or other complex systems), with reduced reagents and time, offering advantages over traditional biological assays.

    View details for DOI 10.1038/srep29752

    View details for PubMedID 27452732

  • NEW HORIZONS IN INTACT PROTEIN ANALYSIS: OPTIMIZATION OF TOP-DOWN PROTEIN ANALYSIS CHEMICAL & ENGINEERING NEWS Sharma, S., Mallick, P., Stoyanova, T., Mullen, C., Weisbrod, C., Canterbury, J., Horn, D., Zabrouskov, V. 2015: 12-14
  • A fully human scFv phage display library for rapid antibody fragment reformatting PROTEIN ENGINEERING DESIGN & SELECTION Li, K., Zettlitz, K. A., Lipianskaya, J., Zhou, Y., Marks, J. D., Mallick, P., Reiter, R. E., Wu, A. M. 2015; 28 (10): 307-315


    Phage display libraries of human single-chain variable fragments (scFvs) are a reliable source of fully human antibodies for scientific and clinical applications. Frequently, scFvs form the basis of larger, bivalent formats to increase valency and avidity. A small and versatile bivalent antibody fragment is the diabody, a cross-paired scFv dimer (∼55 kDa). However, generation of diabodies from selected scFvs requires decreasing the length of the interdomain scFv linker, typically by overlap PCR. To simplify this process, we designed two scFv linkers with integrated restriction sites for easy linker length reduction (17-residue to 7-residue or 18-residue to 5-residue, respectively) and generated two fully human scFv phage display libraries. The larger library (9 × 10(9) functional members) was employed for selection against a model antigen, human N-cadherin, yielding novel scFv clones with low nanomolar monovalent affinities. ScFv clones from both libraries were reformatted into diabodies by restriction enzyme digestion and re-ligation. Size-exclusion chromatography analysis confirmed the proper dimerization of most of the diabodies. In conclusion, these specially designed scFv phage display libraries allow us to rapidly reformat the selected scFvs into diabodies, which can greatly accelerate early stage antibody development when bivalent fragments are needed for candidate screening.

    View details for DOI 10.1093/protein/gzv024

    View details for Web of Science ID 000362837000002

    View details for PubMedID 25991864

  • Cardinal: an R package for statistical analysis of mass spectrometry-based imaging experiments BIOINFORMATICS Bemis, K. D., Harry, A., Eberlin, L. S., Ferreira, C., van de Ven, S. M., Mallick, P., Stolowitz, M., Vitek, O. 2015; 31 (14): 2418-2420


    Cardinal is an R package for statistical analysis of mass spectrometry-based imaging (MSI) experiments of biological samples such as tissues. Cardinal supports both Matrix-Assisted Laser Desorption/Ionization (MALDI) and Desorption Electrospray Ionization-based MSI workflows, and experiments with multiple tissues and complex designs. The main analytical functionalities include (1) image segmentation, which partitions a tissue into regions of homogeneous chemical composition, selects the number of segments and the subset of informative ions, and characterizes the associated uncertainty and (2) image classification, which assigns locations on the tissue to pre-defined classes, selects the subset of informative ions, and estimates the resulting classification error by (cross-) validation. The statistical methods are based on mixture modeling and regularization.o.vitek@neu.eduThe code, the documentation, and examples are available open-source at under the Artistic-2.0 license. The package is available at

    View details for DOI 10.1093/bioinformatics/btv146

    View details for Web of Science ID 000358173500034

  • Predictive Modeling of Drug Response in Non-Hodgkin's Lymphoma PLOS ONE Frieboes, H. B., Smith, B. R., Wang, Z., Kotsuma, M., Ito, K., Day, A., Cahill, B., Flinders, C., Mumenthaler, S. M., Mallick, P., Simbawa, E., Al-Fhaid, A. S., Mahmoud, S. R., Gambhir, S. S., Cristini, V. 2015; 10 (6)


    We combine mathematical modeling with experiments in living mice to quantify the relative roles of intrinsic cellular vs. tissue-scale physiological contributors to chemotherapy drug resistance, which are difficult to understand solely through experimentation. Experiments in cell culture and in mice with drug-sensitive (Eµ-myc/Arf-/-) and drug-resistant (Eµ-myc/p53-/-) lymphoma cell lines were conducted to calibrate and validate a mechanistic mathematical model. Inputs to inform the model include tumor drug transport characteristics, such as blood volume fraction, average geometric mean blood vessel radius, drug diffusion penetration distance, and drug response in cell culture. Model results show that the drug response in mice, represented by the fraction of dead tumor volume, can be reliably predicted from these inputs. Hence, a proof-of-principle for predictive quantification of lymphoma drug therapy was established based on both cellular and tissue-scale physiological contributions. We further demonstrate that, if the in vitro cytotoxic response of a specific cancer cell line under chemotherapy is known, the model is then able to predict the treatment efficacy in vivo. Lastly, tissue blood volume fraction was determined to be the most sensitive model parameter and a primary contributor to drug resistance.

    View details for DOI 10.1371/journal.pone.0129433

    View details for Web of Science ID 000355979500143

    View details for PubMedID 26061425

  • Neuronal Activity Promotes Glioma Growth through Neuroligin-3 Secretion CELL Venkatesh, H. S., Johung, T. B., Caretti, V., Noll, A., Tang, Y., Nagaraja, S., Gibson, E. M., Mount, C. W., Polepalli, J., Mitra, S. S., Woo, P. J., Malenka, R. C., Vogel, H., Bredel, M., Mallick, P., Monje, M. 2015; 161 (4): 803-816


    Active neurons exert a mitogenic effect on normal neural precursor and oligodendroglial precursor cells, the putative cellular origins of high-grade glioma (HGG). By using optogenetic control of cortical neuronal activity in a patient-derived pediatric glioblastoma xenograft model, we demonstrate that active neurons similarly promote HGG proliferation and growth in vivo. Conditioned medium from optogenetically stimulated cortical slices promoted proliferation of pediatric and adult patient-derived HGG cultures, indicating secretion of activity-regulated mitogen(s). The synaptic protein neuroligin-3 (NLGN3) was identified as the leading candidate mitogen, and soluble NLGN3 was sufficient and necessary to promote robust HGG cell proliferation. NLGN3 induced PI3K-mTOR pathway activity and feedforward expression of NLGN3 in glioma cells. NLGN3 expression levels in human HGG negatively correlated with patient overall survival. These findings indicate the important role of active neurons in the brain tumor microenvironment and identify secreted NLGN3 as an unexpected mechanism promoting neuronal activity-regulated cancer growth.

    View details for DOI 10.1016/j.cell.2015.04.012

    View details for Web of Science ID 000354175200014

    View details for PubMedID 25913192

    View details for PubMedCentralID PMC4447122

  • Building high-quality assay libraries for targeted analysis of SWATH MS data. Nature protocols Schubert, O. T., Gillet, L. C., Collins, B. C., Navarro, P., Rosenberger, G., Wolski, W. E., Lam, H., Amodei, D., Mallick, P., MacLean, B., Aebersold, R. 2015; 10 (3): 426-441


    Targeted proteomics by selected/multiple reaction monitoring (S/MRM) or, on a larger scale, by SWATH (sequential window acquisition of all theoretical spectra) MS (mass spectrometry) typically relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of crucial importance for the performance of the methods. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches (PSMs), generation of consensus spectra and compilation of MS coordinates that uniquely define each targeted peptide. Crucial steps such as false discovery rate (FDR) control, retention time normalization and handling of post-translationally modified peptides are detailed. Finally, we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2-3 d to complete, depending on the extent of the library and the computational resources available.

    View details for DOI 10.1038/nprot.2015.015

    View details for PubMedID 25675208

  • Predictive Modeling of Drug Response in Non-Hodgkin's Lymphoma. PloS one Frieboes, H. B., Smith, B. R., Wang, Z., Kotsuma, M., Ito, K., Day, A., Cahill, B., Flinders, C., Mumenthaler, S. M., Mallick, P., Simbawa, E., Al-Fhaid, A. S., Mahmoud, S. R., Gambhir, S. S., Cristini, V. 2015; 10 (6)


    We combine mathematical modeling with experiments in living mice to quantify the relative roles of intrinsic cellular vs. tissue-scale physiological contributors to chemotherapy drug resistance, which are difficult to understand solely through experimentation. Experiments in cell culture and in mice with drug-sensitive (Eµ-myc/Arf-/-) and drug-resistant (Eµ-myc/p53-/-) lymphoma cell lines were conducted to calibrate and validate a mechanistic mathematical model. Inputs to inform the model include tumor drug transport characteristics, such as blood volume fraction, average geometric mean blood vessel radius, drug diffusion penetration distance, and drug response in cell culture. Model results show that the drug response in mice, represented by the fraction of dead tumor volume, can be reliably predicted from these inputs. Hence, a proof-of-principle for predictive quantification of lymphoma drug therapy was established based on both cellular and tissue-scale physiological contributions. We further demonstrate that, if the in vitro cytotoxic response of a specific cancer cell line under chemotherapy is known, the model is then able to predict the treatment efficacy in vivo. Lastly, tissue blood volume fraction was determined to be the most sensitive model parameter and a primary contributor to drug resistance.

    View details for DOI 10.1371/journal.pone.0129433

    View details for PubMedID 26061425

  • Simulation of the Protein-Shedding Kinetics of a Fully Vascularized Tumor. Cancer informatics Frieboes, H. B., Curtis, L. T., Wu, M., Kani, K., Mallick, P. 2015; 14: 163-175


    Circulating biomarkers are of significant interest for cancer detection and treatment personalization. However, the biophysical processes that determine how proteins are shed from cancer cells or their microenvironment, diffuse through tissue, enter blood vasculature, and persist in circulation remain poorly understood. Since approaches primarily focused on experimental evaluation are incapable of measuring the shedding and persistence for every possible marker candidate, we propose an interdisciplinary computational/experimental approach that includes computational modeling of tumor tissue heterogeneity. The model implements protein production, transport, and shedding based on tumor vascularization, cell proliferation, hypoxia, and necrosis, thus quantitatively relating the tumor and circulating proteomes. The results highlight the dynamics of shedding as a function of protein diffusivity and production. Linking the simulated tumor parameters to clinical tumor and vascularization measurements could potentially enable this approach to reveal the tumor-specific conditions based on the protein detected in circulation and thus help to more accurately manage cancer diagnosis and treatment.

    View details for DOI 10.4137/CIN.S35374

    View details for PubMedID 26715830

  • Anti-MET ImmunoPET for Non-Small Cell Lung Cancer Using Novel Fully Human Antibody Fragments MOLECULAR CANCER THERAPEUTICS Li, K., Tavare, R., Zettlitz, K. A., Mumenthaler, S. M., Mallick, P., Zhou, Y., Marks, J. D., Wu, A. M. 2014; 13 (11): 2607-2617
  • Anti-MET immunoPET for non-small cell lung cancer using novel fully human antibody fragments. Molecular cancer therapeutics Li, K., Tavaré, R., Zettlitz, K. A., Mumenthaler, S. M., Mallick, P., Zhou, Y., Marks, J. D., Wu, A. M. 2014; 13 (11): 2607-2617


    MET, the receptor of hepatocyte growth factor, plays important roles in tumorigenesis and drug resistance in numerous cancers, including non-small cell lung cancer (NSCLC). As increasing numbers of MET inhibitors are being developed for clinical applications, antibody fragment-based immunopositron emission tomography (immunoPET) has the potential to rapidly quantify in vivo MET expression levels for drug response evaluation and patient stratification for these targeted therapies. Here, fully human single-chain variable fragments (scFvs) isolated from a phage display library were reformatted into bivalent cys-diabodies (scFv-cys dimers) with affinities to MET ranging from 0.7 to 5.1 nmol/L. The candidate with the highest affinity, H2, was radiolabeled with (89)Zr for immunoPET studies targeting NSCLC xenografts: low MET-expressing Hcc827 and the gefitinib-resistant Hcc827-GR6 with 4-fold MET overexpression. ImmunoPET at as early as 4 hours after injection produced high-contrast images, and ex vivo biodistribution analysis at 20 hours after injection showed about 2-fold difference in tracer uptake levels between the parental and resistant tumors (P < 0.01). Further immunoPET studies using a larger fragment, the H2 minibody (scFv-CH3 dimer), produced similar results at later time points. Two of the antibody clones (H2 and H5) showed in vitro growth inhibitory effects on MET-dependent gefitinib-resistant cell lines, whereas no effects were observed on resistant lines lacking MET activation. In conclusion, these fully human antibody fragments inhibit MET-dependent cancer cells and enable rapid immunoPET imaging to assess MET expression levels, showing potential for both therapeutic and diagnostic applications.

    View details for DOI 10.1158/1535-7163.MCT-14-0363

    View details for PubMedID 25143449

  • Employing ProteoWizard to Convert Raw Mass Spectrometry Data. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] Holman, J. D., Tabb, D. L., Mallick, P. 2014; 46: 13 24 1-9


    After raw data have been captured by mass spectrometers in biological LC-MS/MS experiments, they must be converted from vendor-specific binary files to open-format files for manipulation by most software. This protocol details the use of ProteoWizard software for this conversion, taking format features, coding options, and vendor particularities into account. This protocol will aid researchers in preparing their data for analysis by database search engines and other bioinformatics tools. Curr. Protoc. Bioinform. 46:13.24.1-13.24.9. © 2014 by John Wiley & Sons, Inc.

    View details for DOI 10.1002/0471250953.bi1324s46

    View details for PubMedID 24939128

  • Characterizing deformability and surface friction of cancer cells. Proceedings of the National Academy of Sciences of the United States of America Byun, S., Son, S., Amodei, D., Cermak, N., Shaw, J., Kang, J. H., Hecht, V. C., Winslow, M. M., Jacks, T., Mallick, P., Manalis, S. R. 2013; 110 (19): 7580-7585


    Metastasis requires the penetration of cancer cells through tight spaces, which is mediated by the physical properties of the cells as well as their interactions with the confined environment. Various microfluidic approaches have been devised to mimic traversal in vitro by measuring the time required for cells to pass through a constriction. Although a cell's passage time is expected to depend on its deformability, measurements from existing approaches are confounded by a cell's size and its frictional properties with the channel wall. Here, we introduce a device that enables the precise measurement of (i) the size of a single cell, given by its buoyant mass, (ii) the velocity of the cell entering a constricted microchannel (entry velocity), and (iii) the velocity of the cell as it transits through the constriction (transit velocity). Changing the deformability of the cell by perturbing its cytoskeleton primarily alters the entry velocity, whereas changing the surface friction by immobilizing positive charges on the constriction's walls primarily alters the transit velocity, indicating that these parameters can give insight into the factors affecting the passage of each cell. When accounting for cell buoyant mass, we find that cells possessing higher metastatic potential exhibit faster entry velocities than cells with lower metastatic potential. We additionally find that some cell types with higher metastatic potential exhibit greater than expected changes in transit velocities, suggesting that not only the increased deformability but reduced friction may be a factor in enabling invasive cancer cells to efficiently squeeze through tight spaces.

    View details for DOI 10.1073/pnas.1218806110

    View details for PubMedID 23610435

  • A physical sciences network characterization of non-tumorigenic and metastatic cells SCIENTIFIC REPORTS Agus, D. B., Alexander, J. F., Arap, W., Ashili, S., Aslan, J. E., Austin, R. H., Backman, V., Bethel, K. J., Bonneau, R., Chen, W., Chen-Tanyolac, C., Choi, N. C., Curley, S. A., Dallas, M., Damania, D., Davies, P. C., Decuzzi, P., Dickinson, L., Estevez-Salmeron, L., Estrella, V., Ferrari, M., Fischbach, C., Foo, J., Fraley, S. I., Frantz, C., Fuhrmann, A., Gascard, P., Gatenby, R. A., Geng, Y., Gerecht, S., Gillies, R. J., Godin, B., Grady, W. M., Greenfield, A., Hemphill, C., Hempstead, B. L., Hielscher, A., Hillis, W. D., Holland, E. C., Ibrahim-Hashim, A., Jacks, T., Johnson, R. H., Joo, A., Katz, J. E., Kelbauskas, L., Kesselman, C., King, M. R., Konstantopoulos, K., Kraning-Rush, C. M., Kuhn, P., Kung, K., Kwee, B., Lakins, J. N., Lambert, G., Liao, D., Licht, J. D., Liphardt, J. T., Liu, L., Lloyd, M. C., Lyubimova, A., Mallick, P., Marko, J., McCarty, O. J., Meldrum, D. R., Michor, F., Mumenthaler, S. M., Nandakumar, V., O'Halloran, T. V., Oh, S., Pasqualini, R., Paszek, M. J., Philips, K. G., Poultney, C. S., Rana, K., Reinhart-King, C. A., Ros, R., Semenza, G. L., Senechal, P., Shuler, M. L., Srinivasan, S., Staunton, J. R., Stypula, Y., Subramanian, H., Tlsty, T. D., Tormoen, G. W., Tseng, Y., van Oudenaarden, A., Verbridge, S. S., Wan, J. C., Weaver, V. M., Widom, J., Will, C., Wirtz, D., Wojtkowiak, J., Wu, P. 2013; 3


    To investigate the transition from non-cancerous to metastatic from a physical sciences perspective, the Physical Sciences-Oncology Centers (PS-OC) Network performed molecular and biophysical comparative studies of the non-tumorigenic MCF-10A and metastatic MDA-MB-231 breast epithelial cell lines, commonly used as models of cancer metastasis. Experiments were performed in 20 laboratories from 12 PS-OCs. Each laboratory was supplied with identical aliquots and common reagents and culture protocols. Analyses of these measurements revealed dramatic differences in their mechanics, migration, adhesion, oxygen response, and proteomic profiles. Model-based multi-omics approaches identified key differences between these cells' regulatory networks involved in morphology and survival. These results provide a multifaceted description of cellular parameters of two widely used cell lines and demonstrate the value of the PS-OC Network approach for integration of diverse experimental observations to elucidate the phenotypes associated with cancer metastasis.

    View details for DOI 10.1038/srep01449

    View details for Web of Science ID 000318061300001

    View details for PubMedID 23618955

    View details for PubMedCentralID PMC3636513

  • Anterior gradient 2 (AGR2): Blood-based biomarker elevated in metastatic prostate cancer associated with the neuroendocrine phenotype PROSTATE Kani, K., Malihi, P. D., Jiang, Y., Wang, H., Wang, Y., Ruderman, D. L., Agus, D. B., Mallick, P., Gross, M. E. 2013; 73 (3): 306-315


    Anterior gradient 2 (AGR2) is associated with metastatic progression in prostate cancer cells as well as other normal and malignant tissues. We investigated AGR2 expression in patients with metastatic prostate cancer.Blood was collected from 44 patients with metastatic prostate cancer separated as: castration sensitive prostate cancer (CSPC, n = 5); castration resistant prostate cancer (CRPC, n = 36); and neuroendocrine-predominate CRPC defined by PSA ≤ 1 ng/ml in the presence of wide-spread metastatic disease (NE-CRPC, n = 3). AGR2 mRNA levels were measured with RT-PCR in circulating tumor cell (CTC)-enriched peripheral blood. Plasma AGR2 levels were determined via ELISA assay. AGR2 expression was modulated in prostate cancer cell lines using plasmid and viral vectors.AGR2 mRNA levels are elevated in CTCs and strongly correlated with CTC enumeration. Plasma AGR2 levels are elevated in all sub-groups. AGR2 levels vary independently to PSA and change in some patients in response to androgen-directed and other therapies. Plasma AGR2 levels are highest in the NE-CRPC sub-group. A correlation between AGR2, chromagranin A (CGA), and neuron-specific enolase (NSE) expression is demonstrated in prostate cancer cell lines.We conclude that AGR2 expression is elevated at the mRNA and protein level in patients with metastatic prostate cancer. In particular, we find that AGR2 expression is associated features consistent with neuroendocrine, or anaplastic, prostate cancer, exemplified by an aggressive clinical phenotype without elevation in circulating PSA levels. Further studies are warranted to explore the mechanistic and prognostic implications of AGR2 expression in this patient population.

    View details for DOI 10.1002/pros.22569

    View details for Web of Science ID 000313895900010

    View details for PubMedID 22911164

  • Concurrent Transcript and Protein Quantification in a Massive Single Cell Array Enables Population-Wide Observation of Oncogene Escape 57th Annual Meeting of the Biophysical-Society Park, S., Lee, J. Y., Hong, S., Dimov, I. K., Li, K., Wu, A. M., Mumenthaler, S., Mallick, P., Lee, L. P. CELL PRESS. 2013: 686A–686A
  • Unexpected Dissemination Patterns in Lymphoma Progression Revealed by Serial Imaging within a Murine Lymph Node CANCER RESEARCH Ito, K., Smith, B. R., Parashurama, N., Yoon, J., Song, S. Y., Miething, C., Mallick, P., Lowe, S., Gambhir, S. S. 2012; 72 (23): 6111-6118


    Non-Hodgkin lymphoma (NHL) is a heterogeneous and highly disseminated disease, but the mechanisms of its growth and dissemination are not well understood. Using a mouse model of this disease, we used multimodal imaging, including intravital microscopy (IVM) combined with bioluminescence, as a powerful tool to better elucidate NHL progression. We injected enhanced green fluorescent protein and luciferase-expressing Eμ-Myc/Arf(-/-) (Cdkn2a(-/-)) mouse lymphoma cells (EL-Arf(-/-)) into C57BL/6NCrl mice intravenously. Long-term observation inside a peripheral lymph node was enabled by a novel lymph node internal window chamber technique that allows chronic, sequential lymph node imaging under in vivo physiologic conditions. Interestingly, during early stages of tumor progression we found that few if any lymphoma cells homed initially to the inguinal lymph node (ILN), despite clear evidence of lymphoma cells in the bone marrow and spleen. Unexpectedly, we detected a reproducible efflux of lymphoma cells from spleen and bone marrow, concomitant with a massive and synchronous influx of lymphoma cells into the ILN, several days after injection. We confirmed a coordinated efflux/influx of tumor cells by injecting EL-Arf(-/-) lymphoma cells directly into the spleen and observing a burst of lymphoma cells, validating that the burst originated in organs remote from the lymph nodes. Our findings argue that in NHL an efflux of tumor cells from one disease site to another, distant site in which they become established occurs in discrete bursts.

    View details for DOI 10.1158/0008-5472.CAN-12-2579

    View details for Web of Science ID 000311893100005

    View details for PubMedID 23033441

  • Quantitative Proteomic Profiling Identifies Protein Correlates to EGFR Kinase Inhibition MOLECULAR CANCER THERAPEUTICS Kani, K., Faca, V. M., Hughes, L. D., Zhang, W., Fang, Q., Shahbaba, B., Luethy, R., Erde, J., Schmidt, J., Pitteri, S. J., Zhang, Q., Katz, J. E., Gross, M. E., Plevritis, S. K., McIntosh, M. W., Jain, A., Hanash, S., Agus, D. B., Mallick, P. 2012; 11 (5): 1071-1081


    Clinical oncology is hampered by lack of tools to accurately assess a patient's response to pathway-targeted therapies. Serum and tumor cell surface proteins whose abundance, or change in abundance in response to therapy, differentiates patients responding to a therapy from patients not responding to a therapy could be usefully incorporated into tools for monitoring response. Here, we posit and then verify that proteomic discovery in in vitro tissue culture models can identify proteins with concordant in vivo behavior and further, can be a valuable approach for identifying tumor-derived serum proteins. In this study, we use stable isotope labeling of amino acids in culture (SILAC) with proteomic technologies to quantitatively analyze the gefitinib-related protein changes in a model system for sensitivity to EGF receptor (EGFR)-targeted tyrosine kinase inhibitors. We identified 3,707 intracellular proteins, 1,276 cell surface proteins, and 879 shed proteins. More than 75% of the proteins identified had quantitative information, and a subset consisting of 400 proteins showed a statistically significant change in abundance following gefitinib treatment. We validated the change in expression profile in vitro and screened our panel of response markers in an in vivo isogenic resistant model and showed that these were markers of gefitinib response and not simply markers of phospho-EGFR downregulation. In doing so, we also were able to identify which proteins might be useful as markers for monitoring response and which proteins might be useful as markers for a priori prediction of response.

    View details for DOI 10.1158/1535-7163.MCT-11-0852

    View details for Web of Science ID 000307984800003

    View details for PubMedID 22411897

  • Investigation of acquired resistance to EGFR-targeted therapies in lung cancer using cDNA microarrays. Methods in molecular biology (Clifton, N.J.) Kani, K., Sordella, R., Mallick, P. 2012; 795: 233-253


    Clinical tools to accurately describe, evaluate, and predict an individual's response to cancer therapy are a field-wide priority; in many advanced cancers, only 10-20% of individuals will have a clinical benefit from therapy, yet we treat the entire population. Furthermore, many therapies are initially effective, but lose effectiveness over time. Here we describe methods to derive in vitro models of resistance to EGFR tyrosine kinase inhibitors. We additionally describe approaches to characterize possible mechanisms of resistance by genomic and transcriptomic approaches.

    View details for DOI 10.1007/978-1-61779-337-0_16

    View details for PubMedID 21960227

  • Cancer as a Multi-scale Complex Adaptive System Assessment Of Physical Sciences And Engineering Advances In Life Sciences And Oncology (Aphelion) In Europe Parag Mallick 2012: 4-21
  • Installation and use of LabKey Server for proteomics. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] Eckels, J., Hussey, P., Nelson, E. K., Myers, T., Rauch, A., Bellew, M., Connolly, B., Law, W., Eng, J. K., Katz, J., McIntosh, M., Mallick, P., Igra, M. 2011; Chapter 13: Unit 13 5-?


    LabKey Server (formerly CPAS, the Computational Proteomics Analysis System) provides a Web-based platform for mining data from liquid chromatography-tandem mass spectrometry (LC-MS/MS) proteomic experiments. This open source platform supports systematic proteomic analyses and secure data management, integration, and sharing. LabKey Server incorporates several tools currently used in proteomic analysis, including the X! Tandem search engine, the ProteoWizard toolkit, and the PeptideProphet and ProteinProphet data mining tools. These tools and others are integrated into LabKey Server, which provides an extensible architecture for developing high-throughput biological applications. The LabKey Server analysis pipeline acts on data in standardized file formats, so that researchers may use LabKey Server with other search engines, including Mascot or SEQUEST, that follow a standardized format for reporting search engine results. Supported builds of LabKey Server are freely available at Documentation and source code are available under the Apache License 2.0 at

    View details for DOI 10.1002/0471250953.bi1305s36

    View details for PubMedID 22161569

  • Evolutionary Modeling of Combination Treatment Strategies To Overcome Resistance to Tyrosine Kinase Inhibitors in Non-Small Cell Lung Cancer MOLECULAR PHARMACEUTICS Mumenthaler, S. M., Foo, J., Leder, K., Choi, N. C., Agus, D. B., Pao, W., Mallick, P., Michor, F. 2011; 8 (6): 2069-2079


    Many initially successful anticancer therapies lose effectiveness over time, and eventually, cancer cells acquire resistance to the therapy. Acquired resistance remains a major obstacle to improving remission rates and achieving prolonged disease-free survival. Consequently, novel approaches to overcome or prevent resistance are of significant clinical importance. There has been considerable interest in treating non-small cell lung cancer (NSCLC) with combinations of EGFR-targeted therapeutics (e.g., erlotinib) and cytotoxic therapeutics (e.g., paclitaxel); however, acquired resistance to erlotinib, driven by a variety of mechanisms, remains an obstacle to treatment success. In about 50% of cases, resistance is due to a T790M point mutation in EGFR, and T790M-containing cells ultimately dominate the tumor composition and lead to tumor regrowth. We employed a combined experimental and mathematical modeling-based approach to identify treatment strategies that impede the outgrowth of primary T790M-mediated resistance in NSCLC populations. Our mathematical model predicts the population dynamics of mixtures of sensitive and resistant cells, thereby describing how the tumor composition, initial fraction of resistant cells, and degree of selective pressure influence the time until progression of disease. Model development relied upon quantitative experimental measurements of cell proliferation and death using a novel microscopy approach. Using this approach, we systematically explored the space of combination treatment strategies and demonstrated that optimally timed sequential strategies yielded large improvements in survival outcome relative to monotherapies at the same concentrations. Our investigations revealed regions of the treatment space in which low-dose sequential combination strategies, after preclinical validation, may lead to a tumor reduction and improved survival outcome for patients with T790M-mediated resistance.

    View details for DOI 10.1021/mp200270v

    View details for Web of Science ID 000297537300011

    View details for PubMedID 21995722

  • A High-Confidence Human Plasma Proteome Reference Set with Estimated Concentrations in PeptideAtlas MOLECULAR & CELLULAR PROTEOMICS Farrah, T., Deutsch, E. W., Omenn, G. S., Campbell, D. S., Sun, Z., Bletz, J. A., Mallick, P., Katz, J. E., Malmstroem, J., Ossola, R., Watts, J. D., Lin, B., Zhang, H., Moritz, R. L., Aebersold, R. 2011; 10 (9)


    Human blood plasma can be obtained relatively noninvasively and contains proteins from most, if not all, tissues of the body. Therefore, an extensive, quantitative catalog of plasma proteins is an important starting point for the discovery of disease biomarkers. In 2005, we showed that different proteomics measurements using different sample preparation and analysis techniques identify significantly different sets of proteins, and that a comprehensive plasma proteome can be compiled only by combining data from many different experiments. Applying advanced computational methods developed for the analysis and integration of very large and diverse data sets generated by tandem MS measurements of tryptic peptides, we have now compiled a high-confidence human plasma proteome reference set with well over twice the identified proteins of previous high-confidence sets. It includes a hierarchy of protein identifications at different levels of redundancy following a clearly defined scheme, which we propose as a standard that can be applied to any proteomics data set to facilitate cross-proteome analyses. Further, to aid in development of blood-based diagnostics using techniques such as selected reaction monitoring, we provide a rough estimate of protein concentrations using spectral counting. We identified 20,433 distinct peptides, from which we inferred a highly nonredundant set of 1929 protein sequences at a false discovery rate of 1%. We have made this resource available via PeptideAtlas, a large, multiorganism, publicly accessible compendium of peptides identified in tandem MS experiments conducted by laboratories around the world.

    View details for DOI 10.1074/mcp.M110.006353

    View details for Web of Science ID 000294729200003

    View details for PubMedID 21632744

  • Applying Multi-Agent Techniques to Cancer Modeling Proceedings of the Sixth Workshop on Multiagent Sequential Decision Making in Uncertain Domains Brown M, Bowring Epstein S, Maheswaran R, Mallick P, Tambe M. 2011
  • Interactively Mapping Data Sources into the Semantic Web Proceedings of The First International Symposium on Linked Science Knoblock C, Szekely P, Ambite JL, Gupta S, Aman Goel, Muslea M, Lerman K, Mallick P 2011; 783
  • Model-based discovery of circulating biomarkers. Methods in molecular biology (Clifton, N.J.) Vogelsang, M. S., Kani, K., Katz, J. E., Mallick, P. 2011; 728: 87-107


    Proteomic-based biomarker discovery approaches broadly attempt to identify proteins whose basal abundance, or change in abundance in response to a perturbation (e.g., a therapeutic intervention) is able to discriminate between populations of patients. Up until recently, the majority of approaches for discovering circulating biomarkers have focused on directly profiling serum or plasma to identify such proteins. However, the complexity and dynamic range of protein abundance in serum and plasma create a significant challenge for proteomics methods. To overcome these barriers, diverse approaches to simplify or to fractionate serum and plasma have been developed. For some diseases, such as those related to specific organs, there may be useful marker proteins that originate in the organ. Here, we describe an approach for marker discovery that focuses on the profiling of either primary tissue or cell culture models thereof.

    View details for DOI 10.1007/978-1-61779-068-3_5

    View details for PubMedID 21468942

  • Peptide Identification from Mixture Tandem Mass Spectra MOLECULAR & CELLULAR PROTEOMICS Wang, J., Perez-Santiago, J., Katz, J. E., Mallick, P., Bandeira, N. 2010; 9 (7): 1476-1485


    The success of high-throughput proteomics hinges on the ability of computational methods to identify peptides from tandem mass spectra (MS/MS). However, a common limitation of most peptide identification approaches is the nearly ubiquitous assumption that each MS/MS spectrum is generated from a single peptide. We propose a new computational approach for the identification of mixture spectra generated from more than one peptide. Capitalizing on the growing availability of large libraries of single-peptide spectra (spectral libraries), our quantitative approach is able to identify up to 98% of all mixture spectra from equally abundant peptides and automatically adjust to varying abundance ratios of up to 10:1. Furthermore, we show how theoretical bounds on spectral similarity avoid the need to compare each experimental spectrum against all possible combinations of candidate peptides (achieving speedups of over five orders of magnitude) and demonstrate that mixture-spectra can be identified in a matter of seconds against proteome-scale spectral libraries. Although our approach was developed for and is demonstrated on peptide spectra, we argue that the generality of the methods allows for their direct application to other types of spectral libraries and mixture spectra.

    View details for DOI 10.1074/mcp.M000136-MCP201

    View details for Web of Science ID 000279397200009

    View details for PubMedID 20348588

  • Proteomics: a pragmatic perspective NATURE BIOTECHNOLOGY Mallick, P., Kuster, B. 2010; 28 (7): 695-709


    The evolution of mass spectrometry-based proteomic technologies has advanced our understanding of the complex and dynamic nature of proteomes while concurrently revealing that no 'one-size-fits-all' proteomic strategy can be used to address all biological questions. Whereas some techniques, such as those for analyzing protein complexes, have matured and are broadly applied with great success, others, such as global quantitative protein expression profiling for biomarker discovery, are still confined to a few expert laboratories. In this Perspective, we attempt to distill the wide array of conceivable proteomic approaches into a compact canon of techniques suited to asking and answering specific types of biological questions. By discussing the relationship between the complexity of a biological sample and the difficulty of implementing the appropriate analysis approach, we contrast areas of proteomics broadly usable today with those that require significant technical and conceptual development. We hope to provide nonexperts with a guide for calibrating expectations of what can realistically be learned from a proteomics experiment and for gauging the planning and execution effort. We further provide a detailed supplement explaining the most common techniques in proteomics.

    View details for DOI 10.1038/nbt.1658

    View details for Web of Science ID 000279723900027

    View details for PubMedID 20622844

  • Mass spectrometry based proteomics in cancer research Modern Molecular Biology: Approaches for Unbiased Discovery in Cancer Research Abbani M, Mallick P, Vogelsang M 2010: 117-156
  • Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: The Amsterdam Principles JOURNAL OF PROTEOME RESEARCH Rodriguez, H., Snyder, M., Uhlen, M., Andrews, P., Beavis, R., Borchers, C., Chalkley, R. J., Cho, S. Y., Cottingham, K., Dunn, M., Dylag, T., Edgar, R., Hare, P., Heck, A. J., Hirsch, R. F., Kennedy, K., Kolar, P., Kraus, H., Mallick, P., Nesvizhskii, A., Ping, P., Ponten, F., Yang, L., Yates, J. R., Stein, S. E., Hermjakob, H., Kinsinger, C. R., Apweiler, R. 2009; 8 (7): 3689-3692


    Policies supporting the rapid and open sharing of genomic data have directly fueled the accelerated pace of discovery in large-scale genomics research. The proteomics community is starting to implement analogous policies and infrastructure for making large-scale proteomics data widely available on a precompetitive basis. On August 14, 2008, the National Cancer Institute (NCI) convened the "International Summit on Proteomics Data Release and Sharing Policy" in Amsterdam, The Netherlands, to identify and address potential roadblocks to rapid and open access to data. The six principles agreed upon by key stakeholders at the summit addressed issues surrounding (1) timing, (2) comprehensiveness, (3) format, (4) deposition to repositories, (5) quality metrics, and (6) responsibility for proteomics data release. This summit report explores various approaches to develop a framework of data release and sharing principles that will most effectively fulfill the needs of the funding agencies and the research community.

    View details for DOI 10.1021/pr900023z

    View details for Web of Science ID 000267694600043

    View details for PubMedID 19344107

  • ProteoWizard: open source software for rapid proteomics tools development BIOINFORMATICS Kessner, D., Chambers, M., Burke, R., Agusand, D., Mallick, P. 2008; 24 (21): 2534-2536


    The ProteoWizard software project provides a modular and extensible set of open-source, cross-platform tools and libraries. The tools perform proteomics data analyses; the libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard proteomics and LCMS dataset computations. The library contains readers and writers of the mzML data format, which has been written using modern C++ techniques and design principles and supports a variety of platforms with native compilers. The software has been specifically released under the Apache v2 license to ensure it can be used in both academic and commercial projects. In addition to the library, we also introduce a rapidly growing set of companion tools whose implementation helps to illustrate the simplicity of developing applications on top of the ProteoWizard library.Cross-platform software that compiles using native compilers (i.e. GCC on Linux, MSVC on Windows and XCode on OSX) is available for download free of charge, at This website also provides code examples, and documentation. It is our hope the ProteoWizard project will become a standard platform for proteomics development; consequently, code use, contribution and further development are strongly encouraged.

    View details for DOI 10.1093/bioinformatics/btn323

    View details for Web of Science ID 000260381200017

    View details for PubMedID 18606607

  • Halobacterium salinarum NRC-1 PeptideAtlas: Toward strategies for targeted proteomics and improved proteome coverage JOURNAL OF PROTEOME RESEARCH Van, P. T., Schmid, A. K., King, N. L., Kaur, A., Pan, M., Whitehead, K., Koide, T., Facciotti, M. T., Goo, Y. A., Deutsch, E. W., Reiss, D. J., Mallick, P., Baliga, N. S. 2008; 7 (9): 3755-3764


    The relatively small numbers of proteins and fewer possible post-translational modifications in microbes provide a unique opportunity to comprehensively characterize their dynamic proteomes. We have constructed a PeptideAtlas (PA) covering 62.7% of the predicted proteome of the extremely halophilic archaeon Halobacterium salinarum NRC-1 by compiling approximately 636 000 tandem mass spectra from 497 mass spectrometry runs in 88 experiments. Analysis of the PA with respect to biophysical properties of constituent peptides, functional properties of parent proteins of detected peptides, and performance of different mass spectrometry approaches has highlighted plausible strategies for improving proteome coverage and selecting signature peptides for targeted proteomics. Notably, discovery of a significant correlation between absolute abundances of mRNAs and proteins has helped identify low abundance of proteins as the major limitation in peptide detection. Furthermore, we have discovered that iTRAQ labeling for quantitative proteomic analysis introduces a significant bias in peptide detection by mass spectrometry. Therefore, despite identifying at least one proteotypic peptide for almost all proteins in the PA, a context-dependent selection of proteotypic peptides appears to be the most effective approach for targeted proteomics.

    View details for DOI 10.1021/pr800031f

    View details for Web of Science ID 000259015500014

    View details for PubMedID 18652504

  • Precursor-ion mass re-estimation improves peptide identification on hybrid instruments JOURNAL OF PROTEOME RESEARCH Luethy, R., Kessner, D. E., Katz, J. E., McLean, B., Grothe, R., Kani, K., Faca, V., Pitteri, S., Hanash, S., Agus, D. B., Mallick, P. 2008; 7 (9): 4031-4039


    Mass spectrometry-based proteomics experiments have become an important tool for studying biological systems. Identifying the proteins in complex mixtures by assigning peptide fragmentation spectra to peptide sequences is an important step in the proteomics process. The 1-2 ppm mass-accuracy of hybrid instruments, like the LTQ-FT, has been cited as a key factor in their ability to identify a larger number of peptides with greater confidence than competing instruments. However, in replicate experiments of an 18-protein mixture, we note parent masses deviate 171 ppm, on average, for ion-trap data directed identifications and 8 ppm, on average, for preview Fourier transform (FT) data directed identifications. These deviations are neither caused by poor calibration nor by excessive ion-loading and are most likely due to errors in parent mass estimation. To improve these deviations, we introduce msPrefix, a program to re-estimate a peptide's parent mass from an associated high-accuracy full-scan survey spectrum. In 18-protein mixture experiments, msPrefix parent mass estimates deviate only 1 ppm, on average, from the identified peptides. In a cell lysate experiment searched with a tolerance of 50 ppm, 2295 peptides were confidently identified using native data and 4560 using msPrefixed data. Likewise, in a plasma experiment searched with a tolerance of 50 ppm, 326 peptides were identified using native data and 1216 using msPrefixed data. msPrefix is also able to determine which MS/MS spectra were possibly derived from multiple precursor ions. In complex mixture experiments, we demonstrate that more than 50% of triggered MS/MS may have had multiple precursor ions and note that spectra with multiple candidate ions are less likely to result in an identification using TANDEM. These results demonstrate integration of msPrefix into traditional shotgun proteomics workflows significantly improves identification results.

    View details for DOI 10.1021/pr800307m

    View details for Web of Science ID 000259015500038

    View details for PubMedID 18707148

  • The standard protein mix database: A diverse data set to assist in the production of improved peptide and protein identification software tools JOURNAL OF PROTEOME RESEARCH Klimek, J., Eddes, J. S., Hohmann, L., Jackson, J., Peterson, A., Letarte, S., Gafken, P. R., Katz, J. E., Mallick, P., Lee, H., Schmidt, A., Ossola, R., Eng, J. K., Aebersold, R., Martin, D. B. 2008; 7 (1): 96-103


    Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training data sets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last 5 years, we sought to generate a data set of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the "ISB standard protein mix", using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF), and two MALDI-TOF-TOF platforms. The resulting data set, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at

    View details for DOI 10.1021/pr070244j

    View details for Web of Science ID 000252154200012

    View details for PubMedID 17711323

  • eComputational prediction of proteotypic peptides for quantitative proteomics NATURE BIOTECHNOLOGY Mallick, P., Schirle, M., Chen, S. S., Flory, M. R., Lee, H., Martin, D., Raught, B., Schmitt, R., Werner, T., Kuster, B., Aebersold, R. 2007; 25 (1): 125-131

    View details for DOI 10.1038/nbt1275

    View details for Web of Science ID 000243491000038

  • Quantitative proteomic analysis of the budding yeast cell cycle using acid-cleavable isotope-coded affinity tag reagents PROTEOMICS Flory, M. R., Lee, H., Bonneau, R., Mallick, P., Serikawa, K., Morris, D. R., Aebersold, R. 2006; 6 (23): 6146-6157


    Quantitative profiling of proteins, the direct effectors of nearly all biological functions, will undoubtedly complement technologies for the measurement of mRNA. Systematic proteomic measurement of the cell cycle is now possible by using stable isotopic labeling with isotope-coded affinity tag reagents and software tools for high-throughput analysis of LC-MS/MS data. We provide here the first such study achieving quantitative, global proteomic measurement of a time-course gene expression experiment in a model eukaryote, the budding yeast Saccharomyces cerevisiae, during the cell cycle. We sampled 48% of all predicted ORFs, and provide the data, including identifications, quantitations, and statistical measures of certainty, to the community in a sortable matrix. We do not detect significant concordance in the dynamics of the system over the time-course tested between our proteomic measurements and microarray measures collected from similarly treated yeast cultures. Our proteomic dataset therefore provides a necessary and complementary measure of eukaryotic gene expression, establishes a rich database for the functional analysis of S. cerevisiae proteins, and will enable further development of technologies for global proteomic analysis of higher eukaryotes.

    View details for DOI 10.1002/pmic.200600159

    View details for Web of Science ID 000242879000004

    View details for PubMedID 17133367

  • Protein cross-linking analysis using mass spectrometry, isotope-coded cross-linkers, and integrated computational data processing JOURNAL OF PROTEOME RESEARCH Seebacher, J., Mallick, P., Zhang, N., Eddes, J. S., Aebersold, R., Gelb, M. H. 2006; 5 (9): 2270-2282


    Distance constraints in proteins and protein complexes provide invaluable information for calculation of 3D structures, identification of protein binding partners and localization of protein-protein contact sites. We have developed an integrative approach to identify and characterize such sites through the analysis of proteolytic products derived from proteins chemically cross-linked by isotopically coded cross-linkers using LC-MALDI tandem mass spectrometry and computer software. This method is specifically tailored toward the rapid analysis of low microgram amounts of proteins or multimeric protein complexes cross-linked with nonlabeled and deuterium-labeled bis-NHS ester cross-linking reagents (both commercially available and readily synthesized). Through labeling with [18O]water solvent and LC-MALDI analysis, the method further allows the possible distinction between Type 0 and Type 1 or Type 2 modified peptides (monolinks and looplinks or cross-links), although such a distinction is more readily made from analysis of tandem mass spectrometry data. When applied to the bacterial Colicin E7 DNAse/Im7 heterodimeric protein complex, 23 cross-links were identified including six intersubunit cross-links, all between residues that are close in space when examined in the context of the X-ray structure of the heterodimer. In addition, cross-links were successfully identified in five single subunit proteins, beta-lactoglobulin, cytochrome c, lysozyme, myoglobin, and ribonuclease A, establishing the generality of the approach.

    View details for DOI 10.1021/pr060154z

    View details for Web of Science ID 000240200700024

    View details for PubMedID 16944939

  • Mutagenesis of putative serine-threonine phosphorylation sites proximal to Arg255 of human cytochrome P450c17 does not selectively promote its 17,20-lyase activity FERTILITY AND STERILITY Souter, I., Munir, I., Mallick, P., Weitsman, S. R., Geller, D. H., Magoffin, D. A. 2006; 85: 1290-1299


    To investigate the role of serine-threonine phosphorylation on the activity of human P450c17.In vitro study.Academic basic research laboratory.None.P450c17 expression constructs with a FLAG-tag on either the C-terminus or N-terminus of the protein were generated. Human C-terminal FLAG-tagged P450c17 chromosomal DNA was subjected to site-directed mutagenesis. Serine 258 and threonine 260 each were mutated to alanine and aspartic acid. The mutant P450c17s were expressed in COS-7 cells, and the enzymatic activities were measured.17alpha-Hydroxylase and C(17-20) lyase activities of human P450c17.C-terminal FLAG-tagged P450c17 functioned indistinguishably from the wild-type P450c17. Mutants S258A, S258D, and T260D had significantly less 17alpha-hydroxylase and C(17-20) lyase activities than the wild type.Adding an epitope tag to the C-terminus of the P450c17 protein does not interfere with its activities and will be a useful tool to isolate human P450c17 protein from cultured cells. Phosphorylation of serine 258 but not threonine 260 may act as a physiologic regulator of both enzymatic activities through interaction with obligatory redox partners.

    View details for DOI 10.1016/j.fertnstert.2005.12.011

    View details for Web of Science ID 000236902300028

    View details for PubMedID 16616104

  • Signal maps for mass spectrometry-based comparative proteomics MOLECULAR & CELLULAR PROTEOMICS Prakash, A., Mallick, P., Whiteaker, J., Zhang, H. D., Paulovich, A., Flory, M., LEE, H., Aebersold, R., Schwikowski, B. 2006; 5 (3): 423-432


    Mass spectrometry-based proteomic experiments, in combination with liquid chromatography-based separation, can be used to compare complex biological samples across multiple conditions. These comparisons are usually performed on the level of protein lists generated from individual experiments. Unfortunately given the current technologies, these lists typically cover only a small fraction of the total protein content, making global comparisons extremely limited. Recently approaches have been suggested that are built on the comparison of computationally built feature lists instead of protein identifications. Although these approaches promise to capture a bigger spectrum of the proteins present in a complex mixture, their success is strongly dependent on the correctness of the identified features and the aligned retention times of these features across multiple experiments. In this experimental-computational study, we went one step further and performed the comparisons directly on the signal level. First signal maps were constructed that associate the experimental signals across multiple experiments. Then a feature detection algorithm used this integrated information to identify those features that are discriminating or common across multiple experiments. At the core of our approach is a score function that faithfully recognizes mass spectra from similar peptide mixtures and an algorithm that produces an optimal alignment (time warping) of the liquid chromatography experiments on the basis of raw MS signal, making minimal assumptions on the underlying data. We provide experimental evidence that suggests uniqueness and correctness of the resulting signal maps even on low accuracy mass spectrometers. These maps can be used for a variety of proteomic analyses. Here we illustrate the use of signal maps for the discovery of diagnostic biomarkers. An imple-mentation of our algorithm is available on our Web server.

    View details for DOI 10.1074/mcp.M500133-MCP200

    View details for Web of Science ID 000236142800001

    View details for PubMedID 16269421

  • Analysis of the Saccharomyces cerevisiae proteome with PeptideAtlas GENOME BIOLOGY King, N. L., Deutsch, E. W., Ranish, J. A., Nesvizhskii, A. I., Eddes, J. S., Mallick, P., Eng, J., Desiere, F., Flory, M., Martin, D. B., Kim, B., Lee, H., Raught, B., Aebersold, R. 2006; 7 (11)


    We present the Saccharomyces cerevisiae PeptideAtlas composed from 47 diverse experiments and 4.9 million tandem mass spectra. The observed peptides align to 61% of Saccharomyces Genome Database (SGD) open reading frames (ORFs), 49% of the uncharacterized SGD ORFs, 54% of S. cerevisiae ORFs with a Gene Ontology annotation of 'molecular function unknown', and 76% of ORFs with Gene names. We highlight the use of this resource for data mining, construction of high quality lists for targeted proteomics, validation of proteins, and software development.

    View details for DOI 10.1186/gb-2006-7-11-r106

    View details for Web of Science ID 000243967000010

    View details for PubMedID 17101051

  • The PeptideAtlas project NUCLEIC ACIDS RESEARCH Desiere, F., Deutsch, E. W., King, N. L., Nesvizhskii, A. I., Mallick, P., Eng, J., Chen, S., Eddes, J., Loevenich, S. N., Aebersold, R. 2006; 34: D655-D658


    The completion of the sequencing of the human genome and the concurrent, rapid development of high-throughput proteomic methods have resulted in an increasing need for automated approaches to archive proteomic data in a repository that enables the exchange of data among researchers and also accurate integration with genomic data. PeptideAtlas ( addresses these needs by identifying peptides by tandem mass spectrometry (MS/MS), statistically validating those identifications and then mapping identified sequences to the genomes of eukaryotic organisms. A meaningful comparison of data across different experiments generated by different groups using different types of instruments is enabled by the implementation of a uniform analytic process. This uniform statistical validation ensures a consistent and high-quality set of peptide and protein identifications. The raw data from many diverse proteomic experiments are made available in the associated PeptideAtlas repository in several formats. Here we present a summary of our process and details about the Human, Drosophila and Yeast PeptideAtlas builds.

    View details for DOI 10.1093/nar/gkj040

    View details for Web of Science ID 000239307700138

    View details for PubMedID 16381952

  • A perspective on protein profiling of blood BJU INTERNATIONAL Katz, J. E., Mallick, P., Agus, D. B. 2005; 96 (4): 477-482
  • Scoring proteomes with proteotypic peptide probes NATURE REVIEWS MOLECULAR CELL BIOLOGY Kuster, B., Schirle, M., Mallick, P., Aebersold, R. 2005; 6 (7): 577-583


    Technologies for genome-wide analyses typically undergo a transition from a discovery phase to a scoring phase. In the discovery phase, the genomic universe is explored and all pertinent data are noted. In the scoring phase, relevant entities are screened to reveal groups of genes that are associated with specific biological processes or conditions. In this article, we propose that the transition from a discovery to a scoring phase is also essential, feasible and imminent for proteomics.

    View details for DOI 10.1038/nrm1683

    View details for Web of Science ID 000230245700014

    View details for PubMedID 15957003

  • High throughput quantitative analysis of serum proteins using glycopeptide capture and liquid chromatography mass spectrometry MOLECULAR & CELLULAR PROTEOMICS Zhang, H., Yi, E. C., Li, X. J., Mallick, P., Kelly-Spratt, K. S., Masselon, C. D., Camp, D. G., Smith, R. D., Kemp, C. J., Aebersold, R. 2005; 4 (2): 144-155


    It is expected that the composition of the serum proteome can provide valuable information about the state of the human body in health and disease and that this information can be extracted via quantitative proteomic measurements. Suitable proteomic techniques need to be sensitive, reproducible, and robust to detect potential biomarkers below the level of highly expressed proteins, generate data sets that are comparable between experiments and laboratories, and have high throughput to support statistical studies. Here we report a method for high throughput quantitative analysis of serum proteins. It consists of the selective isolation of peptides that are N-linked glycosylated in the intact protein, the analysis of these now deglycosylated peptides by liquid chromatography electrospray ionization mass spectrometry, and the comparative analysis of the resulting patterns. By focusing selectively on a few formerly N-linked glycopeptides per serum protein, the complexity of the analyte sample is significantly reduced and the sensitivity and throughput of serum proteome analysis are increased compared with the analysis of total tryptic peptides from unfractionated samples. We provide data that document the performance of the method and show that sera from untreated normal mice and genetically identical mice with carcinogen-induced skin cancer can be unambiguously discriminated using unsupervised clustering of the resulting peptide patterns. We further identify, by tandem mass spectrometry, some of the peptides that were consistently elevated in cancer mice compared with their control littermates.

    View details for DOI 10.1074/mcp.M400090-MCP200

    View details for Web of Science ID 000227381300004

    View details for PubMedID 15608340

  • Finding protein domain boundaries: an automated, non-homology-based method IEEE Intelligent Systems Gurbaxani BM, Mallick P 2005; Nov-Dec (6): 26-33
  • Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry GENOME BIOLOGY Desiere, F., Deutsch, E. W., Nesvizhskii, A. I., Mallick, P., King, N. L., Eng, J. K., Aderem, A., Boyle, R., Brunner, E., Donohoe, S., Fausto, N., Hafen, E., Hood, L., Katze, M. G., Kennedy, K. A., Kregenow, F., Lee, H. K., Lin, B. Y., Martin, D., Ranish, J. A., Rawlings, D. J., Samelson, L. E., Shiio, Y., Watts, J. D., Wollscheid, B., Wright, M. E., Yan, W., Yang, L. H., Yi, E. C., Zhang, H., Aebersold, R. 2005; 6 (1)


    A crucial aim upon the completion of the human genome is the verification and functional annotation of all predicted genes and their protein products. Here we describe the mapping of peptides derived from accurate interpretations of protein tandem mass spectrometry (MS) data to eukaryotic genomes and the generation of an expandable resource for integration of data from many diverse proteomics experiments. Furthermore, we demonstrate that peptide identifications obtained from high-throughput proteomics can be integrated on a large scale with the human genome. This resource could serve as an expandable repository for MS-derived proteome information.

    View details for Web of Science ID 000226337200015

    View details for PubMedID 15642101

  • PFIT and PFRIT: Bioinformatic algorithms for detecting glycosidase function from structure and sequence PROTEIN SCIENCE Kleiger, G., Panina, E. M., Mallick, P., Eisenberg, D. 2004; 13 (1): 221-229


    The identification of the enzymes involved in the metabolism of simple and complex carbohydrates presents one bioinformatic challenge in the post-genomic era. Here, we present the PFIT and PFRIT algorithms for identifying those proteins adopting the alpha/beta barrel fold that function as glycosidases. These algorithms are based on the observation that proteins adopting the alpha/beta barrel fold share positions in their tertiary structures having equivalent sets of atomic interactions. These are conserved tertiary interaction positions, which have been implicated in both structure and function. Glycosidases adopting the alpha/beta barrel fold share more conserved tertiary interactions than alpha/beta barrel proteins having other functions. The enrichment pattern of conserved tertiary interactions in the glycosidases is the information that PFIT and PFRIT use to predict whether any given alpha/beta barrel will function as a glycosidase or not. Using as a test set a database of 19 glycosidase and 45 nonglycosidase alpha/beta barrel proteins with low sequence similarity, PFIT and PFRIT can correctly predict glycosidase function for 84% of the proteins known to function as glycosidases. PFIT and PFRIT incorrectly predict glycosidase function for 25% of the nonglycosidases. The program PSI-BLAST can also correctly identify 84% of the 19 glycosidases, however, it incorrectly predicts glycosidase function for 50% of the nonglycosidases (twofold greater than PFIT and PFRIT). Overall, we demonstrate that the structure-based PFIT and PFRIT algorithms are both more selective and sensitive for predicting glycosidase function than the sequence-based PSI-BLAST algorithm.

    View details for DOI 10.1110/ps.03274104

    View details for Web of Science ID 000187587700022

    View details for PubMedID 14691237

  • Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach GENOME BIOLOGY Strong, M., Mallick, P., Pellegrini, M., Thompson, M. J., Eisenberg, D. 2003; 4 (9)


    The genome of Mycobacterium tuberculosis was analyzed using recently developed computational approaches to infer protein function and protein linkages. We evaluated and employed a method to infer genes likely to belong to the same operon, as judged by the nucleotide distance between genes in the same genomic orientation, and combined this method with those of the Rosetta Stone, Phylogenetic Profile and conserved Gene Neighbor computational methods for the inference of protein function.

    View details for Web of Science ID 000185048100012

    View details for PubMedID 12952538

  • The directional atomic solvation energy: An atom-based potential for the assignment of protein sequences to known folds PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Mallick, P., Weiss, R., Eisenberg, D. 2002; 99 (25): 16041-16046


    The Directional Atomic Solvation EnergY (DASEY) is an atom-based description of the environment of an amino acid position within a known 3D protein structure. The DASEY has been developed to align and score a probe amino acid sequence to a library of template protein structures for fold assignment. DASEY is computed by summing the atomic solvation parameters of atoms falling within a tetrahedral sector, or petal, extending 16 A along each of the four bond axes of each alpha-carbon atom of the protein. The DASEY discriminates between pairs of structurally equivalent positions and random pairs in protein structures sharing a fold but belonging to different superfamilies, unlike some previous descriptors of protein environments, such as buried area. Furthermore, the DASEY values have characteristic patterns of residue replacement, an essential feature of a successful fold assignment method. Benchmarking fold assignment with DASEY achieves coverage of 56% of sequences with 90% accuracy when probe sequences are matched to protein structural templates belonging to the same fold but to a different superfamily, an improvement of greater than 200% over a previous method.

    View details for DOI 10.1073/pnas.252626399

    View details for Web of Science ID 000179783400041

    View details for PubMedID 12461172

  • Genomic evidence that the intracellular proteins of archaeal microbes contain disulfide bonds PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Mallick, P., Boutz, D. R., Eisenberg, D., Yeates, T. O. 2002; 99 (15): 9679-9684


    Disulfide bonds have only rarely been found in intracellular proteins. That pattern is consistent with the chemically reducing environment inside the cells of well-studied organisms. However, recent experiments and new calculations based on genomic data of archaea provide striking contradictions to this pattern. Our results indicate that the intracellular proteins of certain hyperthermophilic archaea, especially the crenarchaea Pyrobaculum aerophilum and Aeropyrum pernix, are rich in disulfide bonds. This finding implicates disulfide bonding in stabilizing many thermostable proteins and points to novel chemical environments inside these microbes. These unexpected results illustrate the wealth of biochemical insights available from the growing reservoir of genomic data.

    View details for DOI 10.1073/pnas.142310499

    View details for Web of Science ID 000177042400017

    View details for PubMedID 12107280

  • A modeled hydrophobic domain on the TCL1 oncoprotein mediates association with AKT at the cytoplasmic membrane BIOCHEMISTRY French, S. W., Shen, R. R., Koh, P. J., Malone, C. S., Mallick, P., Teitell, M. A. 2002; 41 (20): 6376-6382


    AKT has a critical role in relaying cell survival and proliferation signals initiated by ligand binding to surface receptors in mammalian cells. Induction of AKT serine/threonine kinase activity is augmented by the T-cell leukemia-1 (TCL1) oncoprotein through a physical association requiring the AKT pleckstrin homology domain. Here, we used molecular modeling and identified an exposed hydrophobic patch composed of two discontinuous amino acid stretches near one end of the TCL1 beta-barrel that was required for a TCL1-AKT association. Site-directed mutations of this region did not affect TCL1 secondary structure, yet they disrupted interactions with AKT. This region was found in other members of the TCL1 oncoprotein family, such as TCL1b and MTCP1, and suggested a conserved, novel AKT binding domain. Interestingly, TCL1 and AKT co-localize in multiple cell compartments, but only extracts from the plasma membrane stimulate optimal complex formation in vitro. Identification of an AKT binding domain on TCL1 is an important step in deciphering the complex interactions that regulate AKT kinase activity in lymphocyte development and neoplasia within the immune system.

    View details for DOI 10.1021/bi016068o

    View details for Web of Science ID 000175651400019

    View details for PubMedID 12009899

  • GXXXG and AXXXA: Common alpha-helical interaction motifs in proteins, particularly in extremophiles BIOCHEMISTRY Kleiger, G., Grothe, R., Mallick, P., Eisenberg, D. 2002; 41 (19): 5990-5997


    The GXXXG motif is a frequently occurring sequence of residues that is known to favor helix-helix interactions in membrane proteins. Here we show that the GXXXG motif is also prevalent in soluble proteins whose structures have been determined. Some 152 proteins from a non-redundant PDB set contain at least one alpha-helix with the GXXXG motif, 41 +/- 9% more than expected if glycine residues were uniformly distributed in those alpha-helices. More than 50% of the GXXXG-containing alpha-helices participate in helix-helix interactions. In fact, 26 of those helix-helix interactions are structurally similar to the helix-helix interaction of the glycophorin A dimer, where two transmembrane helices associate to form a dimer stabilized by the GXXXG motif. As for the glycophorin A structure, we find backbone-to-backbone atomic contacts of the C alpha-H...O type in each of these 26 helix-helix interactions that display the stereochemical hallmarks of hydrogen bond formation. These glycophorin A-like helix-helix interactions are enriched in the general set of helix-helix interactions containing the GXXXG motif, suggesting that the inferred C alpha-H...O hydrogen bonds stabilize the helix-helix interactions. In addition to the GXXXG motif, some 808 proteins from the non-redundant PDB set contain at least one alpha-helix with the AXXXA motif (30 +/- 3% greater than expected). Both the GXXXG and AXXXA motifs occur frequently in predicted alpha-helices from 24 fully sequenced genomes. Occurrence of the AXXXA motif is enhanced to a greater extent in thermophiles than in mesophiles, suggesting that helical interaction based on the AXXXA motif may be a common mechanism of thermostability in protein structures. We conclude that the GXXXG sequence motif stabilizes helix-helix interactions in proteins, and that the AXXXA sequence motif also stabilizes the folded state of proteins.

    View details for DOI 10.1021/bi0200763

    View details for Web of Science ID 000175547000007

    View details for PubMedID 11993993

  • Making sense of proteomics: Using bioinformatics to discover a protein's structure, functions and interactions. Proteins and Proteomics: A Laboratory Manual. Cold Spring Harbor Laboratory Press: Parag Mallick, Edward Marcotte 2002: Chapter 11
  • The 1.7 angstrom crystal structure of BPI: A study of how two dissimilar amino acid sequences can adopt the same fold JOURNAL OF MOLECULAR BIOLOGY Kleiger, G., Beamer, L. J., Grothe, R., Mallick, P., Eisenberg, D. 2000; 299 (4): 1019-1034


    We have extended the resolution of the crystal structure of human bactericidal/permeability-increasing protein (BPI) to 1.7 A. BPI has two domains with the same fold, but with little sequence similarity. To understand the similarity in structure of the two domains, we compare the corresponding residue positions in the two domains by the method of 3D-1D profiles. A 3D-1D profile is a string formed by assigning each position in the 3D structure to one of 18 environment classes. The environment classes are defined by the local secondary structure, the area of the residue which is buried from solvent, and the fraction of the area buried by polar atoms. A structural alignment between the two BPI domains was used to compare the 3D-1D environments of structurally equivalent positions. Greater than 31% of the aligned positions have conserved 3D-1D environments, but only 13% have conserved residue identities. Analysis of the 3D-1D environmentally conserved positions helps to identify pairs of residues likely to be important in conserving the fold, regardless of the residue similarity. We find examples of 3D-1D environmentally conserved positions with dissimilar residues which nevertheless play similar structural roles. To generalize our findings, we analyzed four other proteins with similar structures yet dissimilar sequences. Together, these examples show that aligned pairs of dissimilar residues often share similar structural roles, stabilizing dissimilar sequences in the same fold.

    View details for Web of Science ID 000087680400016

    View details for PubMedID 10843855

  • Selecting protein targets for structural genomics of Pyrobaculum aerophilum: Validating automated fold assignment methods by using binary hypothesis testing PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Mallick, P., Goodwill, K. E., Fitz-Gibbon, S., Miller, J. H., Eisenberg, D. 2000; 97 (6): 2450-2455


    Three-dimensional protein folds were assigned to all ORFs of the recently sequenced genome of the hyperthermophilic archaeon Pyrobaculum aerophilum. Binary hypothesis testing was used to estimate a confidence level for each assignment. A separate test was conducted to assign a probability for whether each sequence has a novel fold-i.e., one that is not yet represented in the experimental database of known structures. Of the 2,130 predicted nontransmembrane proteins in this organism, 916 matched a fold at a cumulative 90% confidence level, and 245 could be assigned at a 99% confidence level. Likewise, 286 proteins were predicted to have a previously unobserved fold with a 90% confidence level, and 14 at a 99% confidence level. These statistically based tools are combined with homology searches against the Online Mendelian Inheritance in Man (OMIM) human genetics database and other protein databases for the selection of attractive targets for crystallographic or NMR structure determination. Results of these studies have been collated and placed at A_HOME/, the University of California, Los Angeles-Department of Energy Pyrobaculum aerophilum web site.

    View details for Web of Science ID 000085941400011

    View details for PubMedID 10706641

  • The accidental bioinformaticist JOURNAL OF CELLULAR BIOCHEMISTRY Mallick, P. 2000; 80 (2): 208-209

    View details for Web of Science ID 000166022600007

    View details for PubMedID 11074589