I am advancing the vision of enabling an understanding of biology at the proteoform level, peering into the cellular machinery in a way that reveals precisely which molecule is acting in the biological system. Recently, I have been working in Emma Lundberg’s lab on understanding how the expression of these molecules varies between individual cells in space and time. Emma Lundberg’s group has a wealth of experience in using microscopy to yield biological images that paint a picture of this cell-to-cell heterogeneity of protein expression information, and joining her lab has deepened my expertise in integrating datasets to perform innovative analyses of single-cell protein expression. I hope to extend this towards analyzing single-cell proteoform expression, understanding the heterogeneity and flux between these proteoforms in space and time, and digging into the fundamental insights about human biology these data may reveal.

Honors & Awards

  • Gary Parr Memorial Award, University of Wisconsin - Madison (2018)
  • Richard and Joan Hartl Award for Research Excellence in Analytical Chemistry, University of Wisconsin - Madison (2017)
  • Computation and Informatics in Biology and Medicine, Predoctoral Trainee, University of Wisconsin - Madison (2014-2017)
  • Stephen Morton Research Award, University of Wisconsin - Madison (2015)
  • Gerhard T. Alexis Scholarship, Gustavus Adolphus College (2011)

Stanford Advisors

All Publications

  • The Blood Proteoform Atlas: A reference map of proteoforms in human hematopoietic cells. Science (New York, N.Y.) Melani, R. D., Gerbasi, V. R., Anderson, L. C., Sikora, J. W., Toby, T. K., Hutton, J. E., Butcher, D. S., Negrao, F., Seckler, H. S., Srzentic, K., Fornelli, L., Camarillo, J. M., LeDuc, R. D., Cesnik, A. J., Lundberg, E., Greer, J. B., Fellers, R. T., Robey, M. T., DeHart, C. J., Forte, E., Hendrickson, C. L., Abbatiello, S. E., Thomas, P. M., Kokaji, A. I., Levitsky, J., Kelleher, N. L. 1800; 375 (6579): 411-418


    Human biology is tightly linked to proteins, yet most measurements do not precisely determine alternatively spliced sequences or posttranslational modifications. Here, we present the primary structures of ~30,000 unique proteoforms, nearly 10 times more than in previous studies, expressed from 1690 human genes across 21 cell types and plasma from human blood and bone marrow. The results, compiled in the Blood Proteoform Atlas (BPA), indicate that proteoforms better describe protein-level biology and are more specific indicators of differentiation than their corresponding proteins, which are more broadly expressed across cell types. We demonstrate the potential for clinical application, by interrogating the BPA in the context of liver transplantation and identifying cell and proteoform signatures that distinguish normal graft function from acute rejection and other causes of graft dysfunction.

    View details for DOI 10.1126/science.aaz5284

    View details for PubMedID 35084980

  • MetaNetwork Enhances Biological Insights from Quantitative Proteomics Differences by Combining Clustering and Enrichment Analyses. Journal of proteome research Carr, A. V., Frey, B. L., Scalf, M., Cesnik, A. J., Rolfs, Z., Pike, K. A., Yang, B., Keller, M. P., Jarrard, D. F., Shortreed, M. R., Smith, L. M. 1800


    Interpreting proteomics data remains challenging due to the large number of proteins that are quantified by modern mass spectrometry methods. Weighted gene correlation network analysis (WGCNA) can identify groups of biologically related proteins using only protein intensity values by constructing protein correlation networks. However, WGCNA is not widespread in proteomic analyses due to challenges in implementing workflows. To facilitate the adoption of WGCNA by the proteomics field, we created MetaNetwork, an open-source, R-based application to perform sophisticated WGCNA workflows with no coding skill requirements for the end user. We demonstrate MetaNetwork's utility by employing it to identify groups of proteins associated with prostate cancer from a proteomic analysis of tumor and adjacent normal tissue samples. We found a decrease in cytoskeleton-related protein expression, a known hallmark of prostate tumors. We further identified changes in module eigenproteins indicative of dysregulation in protein translation and trafficking pathways. These results demonstrate the value of using MetaNetwork to improve the biological interpretation of quantitative proteomics experiments with 15 or more samples.

    View details for DOI 10.1021/acs.jproteome.1c00756

    View details for PubMedID 35073098

  • Proteomics Standards Initiative's ProForma 2.0: Unifying the Encoding of Proteoforms and Peptidoforms. Journal of proteome research LeDuc, R. D., Deutsch, E. W., Binz, P. A., Fellers, R. T., Cesnik, A. J., Klein, J. A., Van Den Bossche, T., Gabriels, R., Yalavarthi, A., Perez-Riverol, Y., Carver, J., Bittremieux, W., Kawano, S., Pullman, B., Bandeira, N., Kelleher, N. L., Thomas, P. M., Vizcaíno, J. A. 2022


    It is important for the proteomics community to have a standardized manner to represent all possible variations of a protein or peptide primary sequence, including natural, chemically induced, and artifactual modifications. The Human Proteome Organization Proteomics Standards Initiative in collaboration with several members of the Consortium for Top-Down Proteomics (CTDP) has developed a standard notation called ProForma 2.0, which is a substantial extension of the original ProForma notation developed by the CTDP. ProForma 2.0 aims to unify the representation of proteoforms and peptidoforms. ProForma 2.0 supports use cases needed for bottom-up and middle-/top-down proteomics approaches and allows the encoding of highly modified proteins and peptides using a human- and machine-readable string. ProForma 2.0 can be used to represent protein modifications in a specified or ambiguous location, designated by mass shifts, chemical formulas, or controlled vocabulary terms, including cross-links (natural and chemical) and atomic isotopes. Notational conventions are based on public controlled vocabularies and ontologies. The most up-to-date full specification document and information about software implementations are available at

    View details for DOI 10.1021/acs.jproteome.1c00771

    View details for PubMedID 35290070

  • Spatiotemporal dissection of the cell cycle with single-cell proteogenomics. Nature Mahdessian, D., Cesnik, A. J., Gnann, C., Danielsson, F., Stenstrom, L., Arif, M., Zhang, C., Le, T., Johansson, F., Shutten, R., Backstrom, A., Axelsson, U., Thul, P., Cho, N. H., Carja, O., Uhlen, M., Mardinoglu, A., Stadler, C., Lindskog, C., Ayoglu, B., Leonetti, M. D., Ponten, F., Sullivan, D. P., Lundberg, E. 2021; 590 (7847): 649–54


    The cell cycle, over which cells grow and divide, is a fundamental process of life. Its dysregulation has devastating consequences, including cancer1-3. The cell cycle is driven by precise regulation of proteins in time and space, which creates variability between individual proliferating cells. To our knowledge, no systematic investigations of such cell-to-cell proteomic variability exist. Here we present a comprehensive, spatiotemporal map of human proteomic heterogeneity by integrating proteomics at subcellular resolution with single-cell transcriptomics and precise temporal measurements of individual cells in the cell cycle. We show that around one-fifth of the human proteome displays cell-to-cell variability, identify hundreds of proteins with previously unknown associations with mitosis and the cell cycle, and provide evidence that several of these proteins have oncogenic functions. Our results show that cell cycle progression explains less than half of all cell-to-cell variability, and that most cycling proteins are regulated post-translationally, rather than by transcriptomic cycling. These proteins are disproportionately phosphorylated by kinases that regulate cell fate, whereas non-cycling proteins that vary between cells are more likely to be modified by kinases that regulate metabolism. This spatially resolved proteomic map of the cell cycle is integrated into the Human Protein Atlas and will serve as a resource for accelerating molecular studies of the human cell cycle and cell proliferation.

    View details for DOI 10.1038/s41586-021-03232-9

    View details for PubMedID 33627808

  • Illuminating non­genetic cellular heterogeneity with spatial proteomics Trends in Cancer Gnann, C., Cesnik, A. J., Lundberg, E. 2021; 7 (4): 278-282


    Cellular heterogeneity is an important biological phenomenon observed across space and time in human tissues. Imaging-based spatial proteomic technologies can provide fruitful new readouts of phenotypic states for individual cells at subcellular resolution, which may help unravel the roles of non-genetic cellular heterogeneity in tumorigenesis and drug resistance.

    View details for DOI 10.1016/j.trecan.2020.12.006

  • Mapping the nucleolar proteome reveals a spatiotemporal organization related to intrinsic protein disorder. Molecular systems biology Stenstrom, L., Mahdessian, D., Gnann, C., Cesnik, A. J., Ouyang, W., Leonetti, M. D., Uhlen, M., Cuylen-Haering, S., Thul, P. J., Lundberg, E. 2020; 16 (8): e9469


    The nucleolus is essential for ribosome biogenesis and is involved in many other cellular functions. We performed a systematic spatiotemporal dissection of the human nucleolar proteome using confocal microscopy. In total, 1,318 nucleolar proteins were identified; 287 were localized to fibrillar components, and 157 were enriched along the nucleoplasmic border, indicating a potential fourth nucleolar subcompartment: the nucleoli rim. We found 65 nucleolar proteins (36 uncharacterized) to relocate to the chromosomal periphery during mitosis. Interestingly, we observed temporal partitioning into two recruitment phenotypes: early (prometaphase) and late (after metaphase), suggesting phase-specific functions. We further show that the expression of MKI67 is critical for this temporal partitioning. We provide the first proteome-wide analysis of intrinsic protein disorder for the human nucleolus and show that nucleolar proteins in general, and mitotic chromosome proteins in particular, have significantly higher intrinsic disorder level compared to cytosolic proteins. In summary, this study provides a comprehensive and essential resource of spatiotemporal expression data for the nucleolar proteome as part of the Human Protein Atlas.

    View details for DOI 10.15252/msb.20209469

    View details for PubMedID 32744794

  • Comprehensive Detection of Single Amino Acid Variants and Evaluation of Their Deleterious Potential in a PANC-1 Cell Line. Journal of proteome research Tan, Z. n., Zhu, J. n., Stemmer, P. M., Sun, L. n., Yang, Z. n., Schultz, K. n., Gaffrey, M. J., Cesnik, A. J., Yi, X. n., Hao, X. n., Shortreed, M. R., Shi, T. n., Lubman, D. M. 2020


    Identifying single amino acid variants (SAAVs) in cancer is critical for precision oncology. Several advanced algorithms are now available to identify SAAVs, but attempts to combine different algorithms and optimize them on large data sets to achieve a more comprehensive coverage of SAAVs have not been implemented. Herein, we report an expanded detection of SAAVs in the PANC-1 cell line using three different strategies, which results in the identification of 540 SAAVs in the mass spectrometry data. Among the set of 540 SAAVs, 79 are evaluated as deleterious SAAVs based on analysis using the novel AssVar software in which one of the driver mutations found in each protein of KRAS, TP53, and SLC37A4 is further validated using independent selected reaction monitoring (SRM) analysis. Our study represents the most comprehensive discovery of SAAVs to date and the first large-scale detection of deleterious SAAVs in the PANC-1 cell line. This work may serve as the basis for future research in pancreatic cancer and personal immunotherapy and treatment.

    View details for DOI 10.1021/acs.jproteome.9b00840

    View details for PubMedID 32058723

  • Spritz: A Proteogenomic Database Engine. Journal of proteome research Cesnik, A. J., Miller, R. M., Ibrahim, K. n., Lu, L. n., Millikin, R. J., Shortreed, M. R., Frey, B. L., Smith, L. M. 2020


    Proteoforms are the workhorses of the cell, and subtle differences between their amino acid sequences or post-translational modifications (PTMs) can change their biological function. To most effectively identify and quantify proteoforms in genetically diverse samples by mass spectrometry (MS), it is advantageous to search the MS data against a sample-specific protein database that is tailored to the sample being analyzed, in that it contains the correct amino acid sequences and relevant PTMs for that sample. To this end, we have developed Spritz (, an open-source software tool for generating protein databases annotated with sequence variations and PTMs. We provide a simple graphical user interface for Windows and scripts that can be run on any operating system. Spritz automatically sets up and executes approximately 20 tools, which enable the construction of a proteogenomic database from only raw RNA sequencing data. Sequence variations that are discovered in RNA sequencing data upon comparison to the Ensembl reference genome are annotated on proteins in these databases, and PTM annotations are transferred from UniProt. Modifications can also be discovered and added to the database using bottom-up mass spectrometry data and global PTM discovery in MetaMorpheus. We demonstrate that such sample-specific databases allow the identification of variant peptides, modified variant peptides, and variant proteoforms by searching bottom-up and top-down proteomic data from the Jurkat human T lymphocyte cell line and demonstrate the identification of phosphorylated variant sites with phosphoproteomic data from the U2OS human osteosarcoma cell line.

    View details for DOI 10.1021/acs.jproteome.0c00407

    View details for PubMedID 32967423

  • Analysis of the Human Protein Atlas Image Classification competition. Nature methods Ouyang, W. n., Winsnes, C. F., Hjelmare, M. n., Cesnik, A. J., Åkesson, L. n., Xu, H. n., Sullivan, D. P., Dai, S. n., Lan, J. n., Jinmo, P. n., Galib, S. M., Henkel, C. n., Hwang, K. n., Poplavskiy, D. n., Tunguz, B. n., Wolfinger, R. D., Gu, Y. n., Li, C. n., Xie, J. n., Buslov, D. n., Fironov, S. n., Kiselev, A. n., Panchenko, D. n., Cao, X. n., Wei, R. n., Wu, Y. n., Zhu, X. n., Tseng, K. L., Gao, Z. n., Ju, C. n., Yi, X. n., Zheng, H. n., Kappel, C. n., Lundberg, E. n. 2019; 16 (12): 1254–61


    Pinpointing subcellular protein localizations from microscopy images is easy to the trained eye, but challenging to automate. Based on the Human Protein Atlas image collection, we held a competition to identify deep learning solutions to solve this task. Challenges included training on highly imbalanced classes and predicting multiple labels per image. Over 3 months, 2,172 teams participated. Despite convergence on popular networks and training techniques, there was considerable variety among the solutions. Participants applied strategies for modifying neural networks and loss functions, augmenting data and using pretrained networks. The winning models far outperformed our previous effort at multi-label classification of protein localization patterns by ~20%. These models can be used as classifiers to annotate new images, feature extractors to measure pattern similarity or pretrained networks for a wide range of biological applications.

    View details for DOI 10.1038/s41592-019-0658-6

    View details for PubMedID 31780840

  • Comprehensive in vivo identification of the c­Myc mRNA protein interactome using HyPR-­MS RNA Spiniello, M., Steinbrink, M. I., Cesnik, A. J., Miller, R. M., Scalf, M., Shortreed, M. R., Smith, L. M. 2019; 25: 1337–1352

    View details for DOI 10.1261/rna.072157.119

  • HyPR-MS for Multiplexed Discovery of MALAT1, NEAT1, and NORAD IncRNA Protein Interactomes JOURNAL OF PROTEOME RESEARCH Spiniello, M., Knoener, R. A., Steinbrink, M. I., Yang, B., Cesnik, A. J., Buxton, K. E., Scalf, M., Jarrard, D. F., Smith, L. M. 2018; 17 (9): 3022-3038


    RNA-protein interactions are integral to the regulation of gene expression. RNAs have diverse functions and the protein interactomes of individual RNAs vary temporally, spatially, and with physiological context. These factors make the global acquisition of individual RNA-protein interactomes an essential endeavor. Although techniques have been reported for discovery of the protein interactomes of specific RNAs they are largely laborious, costly, and accomplished singly in individual experiments. We developed HyPR-MS for the discovery and analysis of the protein interactomes of multiple RNAs in a single experiment while also reducing design time and improving efficiencies. Presented here is the application of HyPR-MS to simultaneously and selectively isolate the interactomes of lncRNAs MALAT1, NEAT1, and NORAD. Our analysis features the proteins that potentially contribute to both known and previously undiscovered roles of each lncRNA. This platform provides a powerful new multiplexing tool for the efficient and cost-effective elucidation of specific RNA-protein interactomes.

    View details for DOI 10.1021/acs.jproteome.8b00189

    View details for Web of Science ID 000444364700010

    View details for PubMedID 29972301

    View details for PubMedCentralID PMC6425737

  • Long Noncoding RNAs AC009014.3 and Newly Discovered XPLAID Differentiate Aggressive and Indolent Prostate Cancers TRANSLATIONAL ONCOLOGY Cesnik, A. J., Yang, B., Truong, A., Etheridge, T., Spiniello, M., Steinbrink, M. I., Shortreed, M. R., Frey, B. L., Jarrard, D. F., Smith, L. M. 2018; 11 (3): 808-814


    The molecular mechanisms underlying aggressive versus indolent disease are not fully understood. Recent research has implicated a class of molecules known as long noncoding RNAs (lncRNAs) in tumorigenesis and progression of cancer. Our objective was to discover lncRNAs that differentiate aggressive and indolent prostate cancers.We analyzed paired tumor and normal tissues from six aggressive Gleason score (GS) 8-10 and six indolent GS 6 prostate cancers. Extracted RNA was split for poly(A)+ and ribosomal RNA depletion library preparations, followed byRNA sequencing (RNA-Seq) using an Illumina HiSeq 2000. We developed an RNA-Seq data analysis pipeline to discover and quantify these molecules. Candidate lncRNAs were validated using RT-qPCR on 87 tumor tissue samples: 28 (GS 6), 28 (GS 3+4), 6 (GS 4+3), and 25 (GS 8-10). Statistical correlations between lncRNAs and clinicopathologic variables were tested using ANOVA.The 43 differentially expressed (DE) lncRNAs between aggressive and indolent prostate cancers included 12 annotated and 31 novel lncRNAs. The top six DE lncRNAs were selected based on large, consistent fold-changes in the RNA-Seq results. Three of these candidates passed RT-qPCR validation, including AC009014.3 (P < .001 in tumor tissue) and a newly discovered X-linked lncRNA named XPLAID (P = .049 in tumor tissue and P = .048 in normal tissue). XPLAID and AC009014.3 show promise as prognostic biomarkers.We discovered several dozen lncRNAs that distinguish aggressive and indolent prostate cancers, of which four were validated using RT-qPCR. The investigation into their biology is ongoing.

    View details for DOI 10.1016/j.tranon.2018.04.002

    View details for Web of Science ID 000433287500029

    View details for PubMedID 29723810

    View details for PubMedCentralID PMC6154865

  • ProForma: A Standard Proteoform Notation JOURNAL OF PROTEOME RESEARCH LeDuc, R. D., Schwammle, V., Shortreed, M. R., Cesnik, A. J., Solntsev, S. K., Shaw, J. B., Martin, M. J., Vizcaino, J. A., Alpi, E., Danis, P., Kelleher, N. L., Smith, L. M., Ge, Y., Agar, J. N., Chamot-Rooke, J., Loo, J. A., Pasa-Tolic, L., Tsybin, Y. O. 2018; 17 (3): 1321-1325


    The Consortium for Top-Down Proteomics (CTDP) proposes a standardized notation, ProForma, for writing the sequence of fully characterized proteoforms. ProForma provides a means to communicate any proteoform by writing the amino acid sequence using standard one-letter notation and specifying modifications or unidentified mass shifts within brackets following certain amino acids. The notation is unambiguous, human-readable, and can easily be parsed and written by bioinformatic tools. This system uses seven rules and supports a wide range of possible use cases, ensuring compatibility and reproducibility of proteoform annotations. Standardizing proteoform sequences will simplify storage, comparison, and reanalysis of proteomic studies, and the Consortium welcomes input and contributions from the research community on the continued design and maintenance of this standard.

    View details for DOI 10.1021/acs.jproteome.7b00851

    View details for Web of Science ID 000426804300036

    View details for PubMedID 29397739

    View details for PubMedCentralID PMC5837035

  • Expanding Proteoform Identifications in Top-Down Proteomic Analyses by Constructing Proteoform Families ANALYTICAL CHEMISTRY Schaffer, L. V., Shortreed, M. R., Cesnik, A. J., Frey, B. L., Solntsev, S. K., Scalf, M., Smith, L. M. 2018; 90 (2): 1325-1333


    In top-down proteomics, intact proteins are analyzed by tandem mass spectrometry and proteoforms, which are defined forms of a protein with specific sequences of amino acids and localized post-translational modifications, are identified using precursor mass and fragmentation data. Many proteoforms that are detected in the precursor scan (MS1) are not selected for fragmentation by the instrument and therefore remain unidentified in typical top-down proteomic workflows. Our laboratory has developed the open source software program Proteoform Suite to analyze MS1-only intact proteoform data. Here, we have adapted it to provide identifications of proteoform masses in precursor MS1 spectra of top-down data, supplementing the top-down identifications obtained using the MS2 fragmentation data. Proteoform Suite performs mass calibration using high-scoring top-down identifications and identifies additional proteoforms using calibrated, accurate intact masses. Proteoform families, the set of proteoforms from a given gene, are constructed and visualized from proteoforms identified by both top-down and intact-mass analyses. Using this strategy, we constructed proteoform families and identified 1861 proteoforms in yeast lysate, yielding an approximately 40% increase over the original 1291 proteoform identifications observed using traditional top-down analysis alone.

    View details for DOI 10.1021/acs.analchem.7b04221

    View details for Web of Science ID 000423011600040

    View details for PubMedID 29227670

    View details for PubMedCentralID PMC5807004

  • Identification and Quantification of Murine Mitochondrial Proteoforms Using an Integrated Top­Down and Intact­Mass Strategy Journal of Proteome Research Schaffer, L. V., Rensvold, J. W., Shortreed, M. R., Cesnik, A. J., Jochem, A., Scalf, M., Frey, B. L., Pagliarini, D. J., Smith, L. M. 2018; 17: 3526–3536
  • Proteoform Suite: Software for Constructing, Quantifying, and Visualizing Proteoform Families JOURNAL OF PROTEOME RESEARCH Cesnik, A. J., Shortreed, M. R., Schaffer, L. V., Knoener, R. A., Frey, B. L., Scalf, M., Solntsev, S. K., Dai, Y., Gasch, A. P., Smith, L. M. 2018; 17 (1): 568-578


    We present an open-source, interactive program named Proteoform Suite that uses proteoform mass and intensity measurements from complex biological samples to identify and quantify proteoforms. It constructs families of proteoforms derived from the same gene, assesses proteoform function using gene ontology (GO) analysis, and enables visualization of quantified proteoform families and their changes. It is applied here to reveal systemic proteoform variations in the yeast response to salt stress.

    View details for DOI 10.1021/acs.jproteome.7b00685

    View details for Web of Science ID 000419749800052

    View details for PubMedID 29195273

    View details for PubMedCentralID PMC5770237

  • Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys BMC GENOMICS Proffitt, J., Glenn, J., Cesnik, A. J., Jadhav, A., Shortreed, M. R., Smith, L. M., Kavanagh, K., Cox, L. A., Olivier, M. 2017; 18: 877


    Shotgun proteomics utilizes a database search strategy to compare detected mass spectra to a library of theoretical spectra derived from reference genome information. As such, the robustness of proteomics results is contingent upon the completeness and accuracy of the gene annotation in the reference genome. For animal models of disease where genomic annotation is incomplete, such as non-human primates, proteogenomic methods can improve the detection of proteins by incorporating transcriptional data from RNA-Seq to improve proteomics search databases used for peptide spectral matching. Customized search databases derived from RNA-Seq data are capable of identifying unannotated genetic and splice variants while simultaneously reducing the number of comparisons to only those transcripts actively expressed in the tissue.We collected RNA-Seq and proteomic data from 10 vervet monkey liver samples and used the RNA-Seq data to curate sample-specific search databases which were analyzed in the program Morpheus. We compared these results against those from a search database generated from the reference vervet genome. A total of 284 previously unannotated splice junctions were predicted by the RNA-Seq data, 92 of which were confirmed by peptide spectral matches. More than half (53/92) of these unannotated splice variants had orthologs in other non-human primates, suggesting that failure to match these peptides in the reference analyses likely arose from incomplete gene model information. The sample-specific databases also identified 101 unique peptides containing single amino acid substitutions which were missed by the reference database. Because the sample-specific searches were restricted to actively expressed transcripts, the search databases were smaller, more computationally efficient, and identified more peptides at the empirically derived 1 % false discovery rate.Proteogenomic approaches are ideally suited to facilitate the discovery and annotation of proteins in less widely studies animal models such as non-human primates. We expect that these approaches will help to improve existing genome annotations of non-human primate species such as vervet.

    View details for DOI 10.1186/s12864-017-4279-0

    View details for Web of Science ID 000415108800002

    View details for PubMedID 29132314

    View details for PubMedCentralID PMC5683380

  • Elucidating Escherichia coil Proteoform Families Using Intact-Mass Proteomics and a Global PTM Discovery Database JOURNAL OF PROTEOME RESEARCH Dai, Y., Shortreed, M. R., Scalf, M., Frey, B. L., Cesnik, A. J., Solntsev, S., Schaffer, L. V., Smith, L. M. 2017; 16 (11): 4156-4165


    A proteoform family is a group of related molecular forms of a protein (proteoforms) derived from the same gene. We have previously described a strategy to identify proteoforms and elucidate proteoform families in complex mixtures of intact proteins. The strategy is based upon measurements of two properties for each proteoform: (i) the accurate proteoform intact-mass, measured by liquid chromatography/mass spectrometry (LC-MS), and (ii) the number of lysine residues in each proteoform, determined using an isotopic labeling approach. These measured properties are then compared with those extracted from a catalog of theoretical proteoforms containing protein sequences and localized post-translational modifications (PTMs) for the organism under study. A match between the measured properties and those in the catalog constitutes an identification of the proteoform. In the present study, this strategy is extended by utilizing a global PTM discovery database and is applied to the widely studied model organism Escherichia coli, providing the most comprehensive elucidation of E. coli proteoforms and proteoform families to date.

    View details for DOI 10.1021/acs.jproteome.7b00516

    View details for Web of Science ID 000414724100021

    View details for PubMedID 28968100

    View details for PubMedCentralID PMC5679780

  • HyCCAPP as a tool to characterize promoter DNA-protein interactions in Saccharomyces cerevisiae GENOMICS Guillen-Ahlers, H., Rao, P. K., Levenstein, M. E., Kennedy-Darling, J., Perumalla, D. S., Jadhav, A. L., Glenn, J. P., Ludwig-Kubinski, A., Drigalenko, E., Montoya, M. J., Goring, H. H., Anderson, C. D., Scalf, M., Gildersleeve, H. S., Cole, R., Greene, A. M., Oduro, A. K., Lazarova, K., Cesnik, A. J., Barfknecht, J., Cirillo, L. A., Gasch, A. P., Shortreed, M. R., Smith, L. M., Olivier, M. 2016; 107 (6): 267-273


    Currently available methods for interrogating DNA-protein interactions at individual genomic loci have significant limitations, and make it difficult to work with unmodified cells or examine single-copy regions without specific antibodies. In this study, we describe a physiological application of the Hybridization Capture of Chromatin-Associated Proteins for Proteomics (HyCCAPP) methodology we have developed. Both novel and known locus-specific DNA-protein interactions were identified at the ENO2 and GAL1 promoter regions of Saccharomyces cerevisiae, and revealed subgroups of proteins present in significantly different levels at the loci in cells grown on glucose versus galactose as the carbon source. Results were validated using chromatin immunoprecipitation. Overall, our analysis demonstrates that HyCCAPP is an effective and flexible technology that does not require specific antibodies nor prior knowledge of locally occurring DNA-protein interactions and can now be used to identify changes in protein interactions at target regions in the genome in response to physiological challenges.

    View details for DOI 10.1016/j.ygeno.2016.05.002

    View details for Web of Science ID 000378623700007

    View details for PubMedID 27184763

    View details for PubMedCentralID PMC5017017

  • Elucidating Proteoform Families from Proteoform Intact-Mass and Lysine-Count Measurements JOURNAL OF PROTEOME RESEARCH Shortreed, M. R., Frey, B. L., Scalf, M., Knoener, R. A., Cesnik, A. J., Smith, L. M. 2016; 15 (4): 1213-1221


    Proteomics is presently dominated by the "bottom-up" strategy, in which proteins are enzymatically digested into peptides for mass spectrometric identification. Although this approach is highly effective at identifying large numbers of proteins present in complex samples, the digestion into peptides renders it impossible to identify the proteoforms from which they were derived. We present here a powerful new strategy for the identification of proteoforms and the elucidation of proteoform families (groups of related proteoforms) from the experimental determination of the accurate proteoform mass and number of lysine residues contained. Accurate proteoform masses are determined by standard LC-MS analysis of undigested protein mixtures in an Orbitrap mass spectrometer, and the lysine count is determined using the NeuCode isotopic tagging method. We demonstrate the approach in analysis of the yeast proteome, revealing 8637 unique proteoforms and 1178 proteoform families. The elucidation of proteoforms and proteoform families afforded here provides an unprecedented new perspective upon proteome complexity and dynamics.

    View details for DOI 10.1021/acs.jproteome.5b01090

    View details for Web of Science ID 000373519900011

    View details for PubMedID 26941048

    View details for PubMedCentralID PMC4917391

  • Human Proteomic Variation Revealed by Combining RNA-Seq Proteogenomics and Global Post-Translational Modification (G-PTM) Search Strategy JOURNAL OF PROTEOME RESEARCH Cesnik, A. J., Shortreed, M. R., Sheynkman, G. M., Frey, B. L., Smith, L. M. 2016; 15 (3): 800-808


    Mass-spectrometry-based proteomic analysis underestimates proteomic variation due to the absence of variant peptides and posttranslational modifications (PTMs) from standard protein databases. Each individual carries thousands of missense mutations that lead to single amino acid variants, but these are missed because they are absent from generic proteomic search databases. Myriad types of protein PTMs play essential roles in biological processes but remain undetected because of increased false discovery rates in variable modification searches. We address these two fundamental shortcomings of bottom-up proteomics with two recently developed software tools. The first consists of workflows in Galaxy that mine RNA sequencing data to generate sample-specific databases containing variant peptides and products of alternative splicing events. The second tool applies a new strategy that alters the variable modification approach to consider only curated PTMs at specific positions, thereby avoiding the combinatorial explosion that traditionally leads to high false discovery rates. Using RNA-sequencing-derived databases with this Global Post-Translational Modification (G-PTM) search strategy revealed hundreds of single amino acid variant peptides, tens of novel splice junction peptides, and several hundred posttranslationally modified peptides in each of ten human cell lines.

    View details for DOI 10.1021/acs.jproteome.5b00817

    View details for Web of Science ID 000371754100014

    View details for PubMedID 26704769

    View details for PubMedCentralID PMC4779408

  • Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation ANNUAL REVIEW OF ANALYTICAL CHEMISTRY, VOL 9 Sheynkman, G. M., Shortreed, M. R., Cesnik, A. J., Smith, L. M., Bohn, P. W., Pemberton, J. E. 2016; 9: 521-545


    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

    View details for DOI 10.1146/annurev-anchem-071015-041722

    View details for Web of Science ID 000379328100023

    View details for PubMedID 27049631

    View details for PubMedCentralID PMC4991544

  • Electrochemical Synthesis of Binary and Ternary Niobium-Containing Oxide Electrodes Using the p-Benzoquinone/Hydroquinone Redox Couple LANGMUIR Papa, C. M., Cesnik, A. J., Evans, T. C., Choi, K. 2015; 31 (34): 9502-9510


    New electrochemical synthesis methods have been developed to obtain layered potassium niobates, KNb3O8 and K4Nb6O17, and perovskite-type KNbO3 as film-type electrodes. The electrodes were synthesized from aqueous solutions using the redox chemistry of p-benzoquinone and hydroquinone to change the local pH at the working electrode to trigger deposition of desired phases. In particular, the utilization of electrochemically generated acid via the oxidation of hydroquinone for inorganic film deposition was first demonstrated in this study. The layered potassium niobates could be converted to (H3O)Nb3O8 and (H3O)4Nb6O17 by cationic exchange, which, in turn, could be converted to Nb2O5 by heat treatment. The versatility of the new deposition method was further demonstrated for the formation of CuNb2O6 and AgNbO3, which were prepared by the deposition of KNb3O8 and transition metal oxides, followed by thermal and chemical treatments. Considering the lack of solution-based synthesis methods for Nb-based oxide films, the methods reported in this study will contribute greatly to studies involving the synthesis and applications of Nb-based oxide electrodes.

    View details for DOI 10.1021/acs.langmuir.5b01665

    View details for Web of Science ID 000360773000027

    View details for PubMedID 26293515