Noncanonical open reading frames encode functional proteins essential for cancer cell survival
2021; 39 (6): 697-+
Although genomic analyses predict many noncanonical open reading frames (ORFs) in the human genome, it is unclear whether they encode biologically active proteins. Here we experimentally interrogated 553 candidates selected from noncanonical ORF datasets. Of these, 57 induced viability defects when knocked out in human cancer cell lines. Following ectopic expression, 257 showed evidence of protein expression and 401 induced gene expression changes. Clustered regularly interspaced short palindromic repeat (CRISPR) tiling and start codon mutagenesis indicated that their biological effects required translation as opposed to RNA-mediated effects. We found that one of these ORFs, G029442-renamed glycine-rich extracellular protein-1 (GREP1)-encodes a secreted protein highly expressed in breast cancer, and its knockout in 263 cancer cell lines showed preferential essentiality in breast cancer-derived lines. The secretome of GREP1-expressing cells has an increased abundance of the oncogenic cytokine GDF15, and GDF15 supplementation mitigated the growth-inhibitory effect of GREP1 knockout. Our experiments suggest that noncanonical ORFs can express biologically active proteins that are potential therapeutic targets.
View details for DOI 10.1038/s41587-020-00806-2
View details for Web of Science ID 000612593200001
View details for PubMedID 33510483
View details for PubMedCentralID PMC8195866
- Reply: Matters Arising 'Investigating sources of inaccuracy in wearable optical heart rate sensors'. NPJ digital medicine 2021; 4 (1): 39
Cas9 activates the p53 pathway and selects for p53-inactivating mutations
2020; 52 (7): 662-+
Cas9 is commonly introduced into cell lines to enable CRISPR-Cas9-mediated genome editing. Here, we studied the genetic and transcriptional consequences of Cas9 expression itself. Gene expression profiling of 165 pairs of human cancer cell lines and their Cas9-expressing derivatives revealed upregulation of the p53 pathway upon introduction of Cas9, specifically in wild-type TP53 (TP53-WT) cell lines. This was confirmed at the messenger RNA and protein levels. Moreover, elevated levels of DNA repair were observed in Cas9-expressing cell lines. Genetic characterization of 42 cell line pairs showed that introduction of Cas9 can lead to the emergence and expansion of p53-inactivating mutations. This was confirmed by competition experiments in isogenic TP53-WT and TP53-null (TP53-/-) cell lines. Lastly, Cas9 was less active in TP53-WT than in TP53-mutant cell lines, and Cas9-induced p53 pathway activation affected cellular sensitivity to both genetic and chemical perturbations. These findings may have broad implications for the proper use of CRISPR-Cas9-mediated genome editing.
View details for DOI 10.1038/s41588-020-0623-4
View details for Web of Science ID 000533846800003
View details for PubMedID 32424350
View details for PubMedCentralID PMC7343612
Adding to the CASeload: unwarranted p53 signaling induced by Cas9.
Molecular & cellular oncology
2020; 7 (5): 1789419
We investigated the genetic and transcriptional changes associated with Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) associated protein 9 (Cas9) expression in human cancer cell lines. For a subset of cell lines with a wild-type tumor protein TP53 (best known as p53), we detected p53 pathway activation, DNA damage accumulation and emerging p53-inactivating mutations following Cas9 introduction. We discuss the potential implications of our findings in basic and translational research.
View details for DOI 10.1080/23723556.2020.1789419
View details for PubMedID 32944644
View details for PubMedCentralID PMC7469564
A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles
2017; 171 (6): 1437-+
We previously piloted the concept of a Connectivity Map (CMap), whereby genes, drugs, and disease states are connected by virtue of common gene-expression signatures. Here, we report more than a 1,000-fold scale-up of the CMap as part of the NIH LINCS Consortium, made possible by a new, low-cost, high-throughput reduced representation expression profiling method that we term L1000. We show that L1000 is highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts. We further show that the expanded CMap can be used to discover mechanism of action of small molecules, functionally annotate genetic variants of disease genes, and inform clinical trials. The 1.3 million L1000 profiles described here, as well as tools for their analysis, are available at https://clue.io.
View details for DOI 10.1016/j.cell.2017.10.049
View details for Web of Science ID 000417362700023
View details for PubMedID 29195078
A robust prognostic signature for hormone-positive node-negative breast cancer
2013; 5: 92
Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs).We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates.Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients.RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.
View details for DOI 10.1186/gm496
View details for Web of Science ID 000326549600002
View details for PubMedID 24112773
View details for PubMedCentralID PMC3961800
Modeling precision treatment of breast cancer
2013; 14 (10): R110
First-generation molecular profiles for human breast cancers have enabled the identification of features that can predict therapeutic response; however, little is known about how the various data types can best be combined to yield optimal predictors. Collections of breast cancer cell lines mirror many aspects of breast cancer molecular pathobiology, and measurements of their omic and biological therapeutic responses are well-suited for development of strategies to identify the most predictive molecular feature sets.We used least squares-support vector machines and random forest algorithms to identify molecular features associated with responses of a collection of 70 breast cancer cell lines to 90 experimental or approved therapeutic agents. The datasets analyzed included measurements of copy number aberrations, mutations, gene and isoform expression, promoter methylation and protein expression. Transcriptional subtype contributed strongly to response predictors for 25% of compounds, and adding other molecular data types improved prediction for 65%. No single molecular dataset consistently out-performed the others, suggesting that therapeutic response is mediated at multiple levels in the genome. Response predictors were developed and applied to TCGA data, and were found to be present in subsets of those patient samples.These results suggest that matching patients to treatments based on transcriptional subtype will improve response rates, and inclusion of additional features from other profiling data types may provide additional benefit. Further, we suggest a systems biology strategy for guiding clinical trials so that patient cohorts most likely to respond to new therapies may be more efficiently identified.
View details for DOI 10.1186/gb-2013-14-10-r110
View details for Web of Science ID 000329387500003
View details for PubMedID 24176112
View details for PubMedCentralID PMC3937590